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Preface 


This work began in October 2005 when I started teaching Coding Theory. 
Coding and cryptography are similar, but the main concern of the former is 
in the existence of a noisy environment whereas that of the latter is in the 
secrecy of the message from unintended party. The concept of entropy, being 
fundamental for the purposes of economy, privacy and reliability, play a role 
in all branches of both subjects. 


In this course the main theme is coding theory. Cryptography was only 
mentioned briefly towards the end. As the students were fairly familiar with 
algebra, but had familiarity with neither finite fields nor polynomial rings, we 
had some practice sessions where the students tried their hands on problems. 
In all we had two practices, on 6 and 13 January, three quizzes, on 20 January, 
3 and 10 February, and one midterm exam on 27 January 2006. Our final 
exam was on 23 February 2006. 


The projections were adapted from the hand-outs given to students for 
the lecture. These hand-outs form here the notes from lecture. Both the notes 
and the projections are written on plain TREX. I stopped making projection 
after the lecture on Linear Code on 9 December 2005. The reason was because 
the nature of activities we did in class had changed. We spent a fair amount 
of our time doing the exercises and problems, so the lecturing was shortened. 
And since the students were by now more familiar with the subject, I needed 
only guide them through the hand-outs. Another reason was because I felt 
that I had been producing too many of them, so I did not want to waste 
more paper. To do a similar thing for other subjects in the future it would 
probably do well to limit the number of these projections to under 20 pages 
for each lecture. 


I had been teaching at Mahidol University. But now I have been told by 
the heads of department of mathematics there that they are going to fire me 
for, for one thing, I worked too hard, and for another I always teach in English. 
To counter that, English is my first language, and I work for God, therefore I 
work day and night with minimum amount of sleep for Him readily and quite 
happily. I can see no reasons why this should make anybody unhappy, I do 
not even believe in politics I only believe in God. 


I thank my students for allowing me the privilege of teaching them. I 
hope they have learnt from me as much as I have from them. 


Kit Tyabandha, PhD 
Bangkok, 14° January, 2007 
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Error and distance 
28" October 2005 


message source receiver 
\ message decoded message 
source encoder source decoder 
\jcode word decoded word 
channel encoder §=[-—————>> channel |_____= >| channel decoder 
code word received 
with redundancy vector 


noise 


Figure 1 Encoding and decoding of message. The channel encoder and 
decoder are there to introduce redundancy which let us detect and correct 
errors. 


Criteria for designing channel encoding algorithm and for the construc- 
tion of the encoder and the decoder are namely fast encoding and decoding of 
messages, easy transmission of encoded messages, maximum rate of transfer 
of information, and maximum detection or correction capability. 


Definition 1. Let A = {aj,...,a,} be a code alphabet of size q, and its 
elements are the code symbols. We call a q-ary word of length n over A a 
sequence W = W1--- Wp, or equivalently a vector (wi,...,Wn), where w; € A 
for all 1. We call a q-ary block code of length n over A a nonempty set C of 
q-ary words, that is code words, all of which is of the same length n. The 
number of code words C contains is the size m of C, consequently m = |C|. 
The information rate of the code C is (log, |C|)/n. We call an (n,m) — code 
a code of length n and size m. 


§ 


From Definition 1 we can see that C' is a code containing code words each 
of which is composed of symbols from the code alphabet. A gq-ary block code 
is a set of g-ary code words. 


Example 1. A code over the code alphabet F2 = {0,1} is called a binary 
code, one over F3 = {0,1,2} is called a ternary code. The term quaternary 
code refers to a code over either F, = {0,1, 2,3} or Zs = {0, 1, 2, 3}. 


Definition 2. A communication channel consists of a finite channel alphabet 
A= {a1,...,@q} together with a set of forward channel probabilities pa,,, such 


that for all 7 
q 
Pay = 1 
j=l 
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where pg,; is the conditional probability that aj; is received, given that a; is 
sent. If x is the word received when a word c was sent, e is the number of 
places where x and c differ, and n the length of each word, then the forward 
channel probability is pex = p*(1 — p)"~®. 

8 


The probability pa» is normally written p(breceived|a sent). 


Definition 3. Let c = c,---c, and x = %---%, be words of length n. 
Then a communication channel is said to be memoryless if 


n 
Pex = Il Peni 
iat 


§ 


Definition 3 tells us that a communication channel is memoryless if the 
outcome of any one transmission is independent of the outcome of the previous 
transmissions. 


Definition 4. A memory less channel with a channel alphabet of size q is 
called a q-ary symmetric channel if each symbol transmitted has the same 
probability p < 5 of being received in error, and whenever a wrong symbol 
is received, each of the gq — 1 possible errors is equally likely. If p > 5; the 
channel is known to be useless. 


§ 


Example 2. The binary symmetric channel (BSC) is a memoryless channel 
having a channel alphabet {0,1} and channel probabilities p91 = pio = p and 
poo = Pir = 1—p. This probability of a bit error p in a BSC is called the 
cross-over probability of the BSC. 


Example 3. When a received word is not among the vocabulary of the 
code, the most likely word sent is the one whose pe,x, is maximum over all 
i=1,...,m. A rule for finding the most likely code word sent in case of an 
error is called a decoding rule. 


The procedure for finding the most likely message sent is described in 
Algorithm 1. Here c? means the word deduced to be the actual code word 
sent to the best of our guess. 


Algorithm 1 Decoding algorithm 


for all words x; received do 
if x; is not a valid code word then 
c? <the most likely code word c; sent, according to the decoding 
rule used 
else 
ce x; 
endif 
endfor 
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Definition 5. The mazimum likelihood decoding is Petx = MAaXcec Pex, 
where x is the word received. 


§ 


Example 4. Two kinds of maximum likelihood decoding are, when it hap- 
pens that there are more than one word that has the same maximum like- 
lihood, the complete maximum likelihood decoding chooses one of them arbi- 
trarily, while the incomplete maximum likelihood decoding rejects all of them 
and asks for a retransmission. 


§ 


Exercise 1. Code words from the binary code {00001, 00111, 02020, 00001} 
are sent over a binary symmetric channel with the cross-over probability p = 
0.002. Using the maximum likelihood decoding rule, decode the words, 01111, 
01110, 11000, 10101 and 11111. 


§ 


Exercise 2. Write the IMLD (incomplete maximum likelihood decoding) 
table for the code C = {001,100,110,111}, and then again for the code 
C = {101, 111, 110}. 

§ 


Exercise 3. Write the CMLD (complete maximum likelihood decoding) 
table for the code C = {110, 101,011,001, 100}, and then again for the code 
C = {000, 111, 010, 101}. 

8 


Exercise 4. A memoryless binary channel with channel probabilities pog = 
0.81 and pi; = 0.95. Code words from the code C = {000,001, 011,111} 
are being sent over the channel. With the help of the maximum likelihood 
decoding rule, decode the words 010, 101, 100 and 110. 


§ 


Exercise 5. A ternary code is C = {01202, 21201, 11220, 00112}. Using the 
nearest neighbour decoding rule, decode the words 01112, 02221, 12121 and 
01012. 


§ 


Definition 6. Let x = 21---¢, and y = y1--'Yn be words of length n 
over an alphabet A. Then the Hamming distance between x and y, denoted 
d(x, y), is the number of places where x and y are different from each other, 
and 


d(x, y) = d(a1, y1) beet d(%n, Yn) 
where 


d(xi, yi) = { 1 ifa; £y; 
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Theorem 1. The Hamming distance d(x, c) =i corresponds to the forward 
channel probability pex = p*(1 — p)”~*. 


Proof. This is obvious from Definition’s 2 and 6. a 


Example 5. From Definition 6 it follows that 0 < d(x,y) <n; d(x,y) =0 
if and only if x = y; and d(x, y) = d(y,x). 


Example 6. Let A be the roman alphabet. If x = ‘breed’, y = ‘bread’, and 
z = ‘break’, then d(x, y) = d(y,z) = 1, and d(x,z) = 2. On the other hand 
if A = {0,1,2,3,4,5,6}, p = 24601 and q = 54321, then d(p,q) = 3. 


Theorem 2. Let x, y and z be words of length n over A. Then the 
triangular inequality for their mutual Hamming distance holds, that is 


d(x,z) < d(x, y) + d(y,z) 


Proof. Let a = d(x,z), b= d(x,y), and c=d(y,z). We have a> 0, b>0 
and c > 0. What this theorem states is obvious when a = 0. If a > 0, then 
either b = 0 or b > O; if the former is the case, that is b = 0, then a = c and 
the theorem is true. If both a > 0 and b > 0, then either c = 0 or c > 0; if 
c = 0, then a = Bb and the theorem is again true. But if a > 0, b > 0 and 
c > 0, then a, b and c may come from some of the diffences in common, as 
could be shown in the following Venn diagram. 


Figure 2 Common differences among a, b and c. 
| (s\ ) 


Cc 


Let («,y) be the differences in common between distances x and y, and 
similarly (x, y, z) those among z, y and z. Then from Figure 2 the area 1 is (a); 
2, (b); 3, (c); 4, (a,c); 5, (a,b); 6, (b,c); and 7, (a,b,c). Then, d(x,z) arises 
from the differences (a)+(c)+(a, b)+(b, c), d(x, y) from (a)+(b)+(a, c)+(b, c), 
d(y,z) from (b) + (c) + (a,c) + (a,b), and therefore d(x,y) + d(y,z) gives 
(a) + (b) + (c) + (a,c) + (a, b) + (b,c), which is never less than in the case of 
d(x, y) and hence the theorem is again true. This exhausts all the cases and 
the theory is proved. o 
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Definition 7. The minimum distance or nearest neighbour decoding rule 
decodes x to cx if d(x,cx) = mincec d(x, c). 
8 


Exercise 6. A binary code is C = {001,010,100}. If code words are sent 
over a memoryless binary channel whose channel probabilities are pog = 0.15 
and pi; = 0.5, use the maximum likelihood decoding rule to decode the word 
111. Then decode 111 again using the nearest neighbour decoding rule. 


§ 


Exercise 7. Our binary code is C = {01010, 10101, 10011, 00110}. Use the 
NN (nearest neighbour) decoding rule, decode the words 00000, 11111, 01001, 
11011 and 00100. 


8 
Theorem 3. The maximum likelihood decoding rule and the minimum 
distance decoding rule is the same for a BSC with cross-over probability 
p<. 


Proof. From Theorem 1, when p < 4, gives 


p(l—p)” >++->p"(1—p)° 


Thus the less the distance the more the likelihood, and thus the theorem is 
proved. Oo 


Definition 8. Let C be a code containing at least two words. Then, the 
minimum distance or the distance of C is 


d(C) = min{d(x, y)|x,y € C,x Z y} 


A code of length n, size m, and distance d is called an (n,m, d)-code. 


Exercise 8. Consider a (n,a,n)-code where n > 2. Find the value of a. 
§ 


Definition 9. Let a code word be of length n. Then, an error vector of 
weight k is a word containing all the k errors occured taking the value of 1 in 
their corresponding positions with the remaining positions of the word being 
zero. An error vector is also called an error word or an error pattern. 


§ 


Definition 10. An error vector is said to be detected by a code if a+ e is 
not a code word for any code word a. If there exists some code word a such 
that a+e is also a code word, we say that the error vector e goes undetected. 


§ 


Definition 11. Let a received word x differ from the actual code word 
sent c by e errors. Then the corresponding code C’ is said to be u-error- 
detecting if x is not a code word whenever 1 < e < u. Moreover, C is exactly 
u-error-detecting if it is u-error-detecting but not (u + 1)-error-detecting. 
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§ 
Theorem 4. A code C is u-error-detecting if and only if d(C) >u+1. 


Proof. Let c € C. Ifd(C) > u+1, then x such that 1 < d(x,c) <u < d(C) 
implies that x ¢ C, therefore C is u-error-detecting. On the other hand, 
if d(C) < w+ 1, that is d(C) < u, then there exist xj,x2 € C such that 
1 < d(C) < d(x1,x2) < u, then it is possible to send c, € C and incur 
errors such that 1 < d(x,c1) = d(c2,c1) < u and x = cg, hence C is not a 
u-error-detecting code. Oo 


Corollary 4[1]. A code with distance d is exactly (d — 1)-error-detecting. 
§ 


Definition 12. Let v be a positive integer and assuming the incomplete 
decoding rule is used. Then a code C is said to be v-error-correcting if the 
minimum distance decoding can correct for it up to v errors. It is said to 
be exactly v-error-correcting if it is v-error-correcting but not (v + 1)-error- 
correcting. 


§ 
Theorem 5. A code C is v-error-correcting if and only if d(C) > 2u +1. 


Proof. Suppose that d(C) > 2v+1. Let c € C be the code word sent, x 
the word received, and e errors occurred such that e < v. Then d(x,c) < v, 
and if C is not to be v-error-correcting there must be some c;,c2 € C such 
that d(x,ci) + d(x,c2) < 2v. But since d(C) > 2u +1, which means that 
d(x,c1) + d(x,c2) > 2v 4+ 1 for all c1,c2 € C, it follows that C must be 
v-error-correcting. 

Next, suppose that C is v-error-correcting and d(C) < 2u+1. Then 
d(C) < 2uv, that is to say, there exist c1,c2 € C such that d(ci,c2) < 2v. 
This means that there exist x such that d(x,c1) + d(x, c2) = d(e1,¢2) < 2u, 
hence C’ is not v-error-correcting. This contradicts what we have supposed 
earlier, therefore necessarily d(C) > 2v +1. o 


Corollary 5[1]. A code with distance d is exactly | G2 |-error-correcting 


code, where |x| is the greatest integer less than or equal to a. 


§ 
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Exercises for Error and distance 
14% January, 2007 


1. Binary code words from the code {000,010, 101,110, 111} are sent over a 
binary symmetric channel (BSC) with cross-over probability p = 0.02. Decode 
using the maximum likelihood decoding rule the following words: 001, 011 
and 001. 
2. Let a memoryless binary channel have channel probabilities poo = 0.8 
and pi; = 0.9, where p;; is the probability that 7 is received when i is sent. 
Suppose the code words being sent over this channel are from the binary code 
{000, 100, 110,011, 111}. Decode the words 001, 010 and 101 with the use of 
the maximum likelihood decoding rule. 
3. Consider a binary code C = {010, 110, 101}. 
a. Use the nearest neighbour decoding rule to decode the received word 000. 
b. Let our channel be binary and memoryless with the probabilities poo = 
0.2 and pi, = 0.5. Decode the received word 000 using the maximum 
likelihood decoding rule. 
4. Decode using the nearest neighbour decoding rule for the binary code 


C = {10110, 11000, 10100, 10011, 11011} 


the received words 00000, 00011, 01101, 01111 and 10011. 
5. Use the nearest neighbour decoding rule to decode for the ternary code 


C = {01122, 10021, 20210, 22200} 


the received code words 00122, 12001, 20111 and 22000. 

6. Construct the incomplete maximum likelihood decoding (IMLD) table for 
the binary code C = {000, 010, 101, 110, 111}. 

7. Find the number of binary (n, 2,n)-codes, n > 2, where for a (n,m, d)-code 
n is the length of the word, m the size of the dictionary and d the distance 
of the code. 

8. Consider the binary repetition code of length 6 sent over a binary sym- 
metric channel which has symbol error probability p. Find the word error 
probability of the code. 

9. Consider g-ary (3, m, 2)-codes, where g > 2. Find the range which m may 
take. 

10. Let A,(n,d) represent the largest value of m such that there exists a 
g-ary (n,m, d)-code. Find A,(n,1) and A,(n,n). Then find A,(3, 2) for any 
integer q > 2. 

11. Find the upper bound of m for the g-ary (¢ + 1,m, 3)-code. 

12. Consider a balanced block (b, v,r,k, A)-design, where b is the number of 
subsets B; of a set S of v elements, each point appears in exactly r blocks, 
each block comprises exactly & points, and each pair of points occurs together 
in exactly \ blocks. Here B; are call blocks, and S is said to contain v varieties. 
Show that b& = ur and r(k — 1) = A(v — 1) 
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13. A permutation of a set S is a one-to-one mapping from S to itself. Two 
q-ary codes are said to be equivalent to each other if one can be obtained 
from the other by permutation of the positions of the code, or permutation 
of the symbols appearing in a fixed position, or any combination of both. 
Show that the binary codes C, = {00100, 00011, 11000, 11111} and Cy = 
{00000, 01101, 10110, 11011} are equivalent. Then show that the ternary code 
C3 = {012, 120, 201} is equivalent to the ternary repetition code of length 3, 
C4 = {000, 111, 222}. 

14. Prove that a sphere of radius r in F7, 0 <r < n, contains exactly 


(5) + (7) @-p4--+(*) @-ay 
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Entropy and mutual information 
4th November 2005 


Definition 13. <A probability space is a triple (S,B,P) on the domain S, 
which is a nonempty set called the sample space, where (S, B) is a measurable 
space, B is a Borel field of subsets of S, and P is a measure on S with the 
property that P(S) = 1 and, for all disjoint E; € B, 


P (U z = P(E) 


In other words, P is a nonnegative function defined for all events £; € B, and B 
measurable subsets of S. Further, a random variable X is a function mapping 
S into some set R, called the range of X. For convenience, we shall also use X 
to represent both the function and its own range, that is X is a function which 
maps S into X. IfS is discrete and f is some real-valued function defined on S, 
then both X and f(X) are two different random variables, and the expectation 
of the latter is given by, 


E [£(X)] =) p(@)f(2) 


§ 


Definition 14. Let p(x) be the probability that « € X occurs, similarly 
p(y) that y € Y does-, while p(x, y) that both 2 € X and y € Y do occur. 
Then, 


_ Py) 
play) = 2 (1) 
and 
pyz) = BY (2) 


Definition 15. A Markov chain is a set of random variable X;, where 
t=0,1,..., such that, 


P (Xi = j|Xo = to, .--, Xe-1 = te—-1) = P (Ke = G|Xe_-1 = We-1) 


In other words, given the present state, the next state is conditionally inde- 
pendent of the past. 


§ 


Definition 16. A subset K C E”, where E” is the Euclidean space of n 
dimensions, is called convez if the line segment joining any two points in K 
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is contained in K. Let the two points be x; and 22, then the line segment 
joining them together is x = tx; + (1 —t)r2, where 0 <¢< 1. 


§ 


Definition 17. A point z is said to be a convex combination of points 
L1,-++,XLm if there exist nonnegative scalars a1,...,Q@ ,, such that )>a; = 1 
and >> a,;#; = x. The set of all convex combinations of x;, i = 1,...,m, is 
called the convex hull of {x;}. 


§ 


Definition 18. Let f be a real-valued function, and let K be a convex subset 
of the domain of f. Then f is said to be convex cup if, for every 21, x2 € K 
and0 <t<1, 


f(t, + (1 — x2) < t£(ay) + (1 — t)f(22) (3) 


It is said to be strictly convex cup if strict inequality holds in Equation 3 
whenever 2+ 14 22. Similarly, f is said to be convex cap if, 


f(t, + (1 — t)x2) > t£(ay) + (1 — t)f(22) (4) 


that is to say, if —f is convex cup. It is said to be strictly convex cap if strict 
inequality holds in Equation 4 whenever 7; # 22. Convex cap is also known 
as concave. Geometrically speaking, f is convex cup if and only if all its chords 
lie above or on the graph of f, and f is concave if and only if all its chords lie 
below or on the graph of the same. 


§ 


Definition 19. Let K be some interval in E!, and let F(x) be a probability 
distribution concentrated on K such that P(X < 2) = F(z). Then, if the 
expectation E(X) exists, and if f(x) is a convex cup function, then, 


E(f(X)) 2 f(E(X)) (5) 


If f is strictly convex cup, then strict inequality holds in Equation 5. Similarly, 
if f is convex cap, then, 
E(f(X)) < f(E(X) (6) 


If f is strictly convex cap, then strict inequality holds in Equation 6. 


§ 


Example 7. Suppose that in Definition 19 there is a mass distribution 
placed on the graph of f, then Equation 5 says that the overall centre of mass 
will lie above or on the graph, while Equation 6 says that it will lie below it. 


Entropy is a measure of uncertainty of many events as a single value. We 
derive it from Axiom’s 1 and 2. 
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Axiom 1. If the events are all equally likely, then the uncertainty function 
H ( ee +) is monotonously increasing with m. 
§ 


wim 
Axiom 2. If {E},...,E£1,} and {E?,...,E?} are statistically independent 
sets of equally likely disjoint events, then the uncertainty of the sets of events 
{Pe Et = Eyed = Lege gyi} s 


af 1 
B(o me) = H(t) +H (S2) 
mn mn m m n n 
That is to say, h(mn) = h(m) + h(n), where h(m) = H(4,...,4). 


Definition 20. Let the set of m possible disjoint events be 
E= {k,...,Em} 


We call an apriori probability of E;, p(E;), where 1 <i < mand 7", p(E,) = 
1. The uncertainty function or the entropy function, H(p(1),...,p(m)) obeys 
Axiom’s 1 and 2. 

8 


The entropy of a random variable x gives a measure of the amount of 
information obtained from an observation of x. It also represents the ran- 
domness of x and our uncertainty about x. The less probable an event is, the 
more information we receive when it occurs. 


Theorem 6. The entropy of a set of m equally likely events is h(m) = 
Alog,m, where is a positive constant and c > 1. 


Proof. Proving Theorem 6 amounts to proving that Axiom’s 1 and 2 are 
satisfied if and only if h(m) = Alog.m. The two axioms say that h(m) is 
monotonously increasing in m and 


h(mn) = h(m) +h(n) (7) 


According to Equation 7, ifm =n = 1, then h(1) = h(1) +h(1), which implies 
that h(1) = 0. From this together with both axioms above, h(m) = Alog.m 
is sufficient as a solution. 
Next, we must prove that this solution is necessarily the only solution. 
Let a, b and c be positive integers, and a,b,c > 1. Then there exists a unique 
integer d such that 
ial eet (8) 


From Equation 8 it follows that, 
dlogce < bloga < (d+1) loge 
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and therefore, 
d _loga _d+1 
woe PSs 
b — loge b (9) 
Since h(m) is monotonously increasing, from Equation 8 we have, 
h(c*) < h(a’) < h(c#t!) 


Then from Equation 7, dh(c) < bh(a) < (d+ 1)h(c). And since h(m) is 
monotonously increasing, 


d g h(a) 2 d+1 
b — h(c) b 
From Equation’s 9 and 10 it follows that, 


(10) 


1 


loga h(a) us 
b 


loge h(c) 


And, since b is arbitrary positive integer, 
h(a) _ loga 
h(c) loge 


h(a) _ hte) 
loga loge 


Since a and c¢ are arbitrary, 


h(a) _,_ no 
loga —s loge 


Therefore, necessarily h(m) = Alog, m is the only solution. q 


Axiom 3. The total uncertainty of events does not depend on the method 
of indication. 


§ 


Axiom 4. The uncertainty measure is a continuous function with regard to 
the probabilities within it. 
§ 
Example 8. Let a set E of m disjoint events be 
{£,,...,En} 


Let j;, 1 =0,...,n, be integers and 0 = jo < Ji < Jo--: < jn =m, and E be 
divided into n sets of events, namely, 


G, = {£,,...,£;,} 
G2 = (pais ie -, Bj} 


Gr = (Ea Pis nie -, Em} 
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If we indicate firstly the group, and then the event within that group, then 
the uncertainty becomes, 


n 
H(p(G1), sos »P(Gn)) + S- p(Gi)H(p(Ej,_1411Gi), eg p(E;, 
i=1 
The grouping axiom, Axiom 3, lets us express the uncertainty when all 
the event probabilities are rational. By grouping equally likely events together 
and then consider each of the groups as a single event, it gives us the ability 
to deal with events which are not equally likely. Example 9 gives an example 
how this is done. Then Axiom 4 extends Axiom 3 to cover also irrational 
probabilities, and Equation 12 is the result. 


Gi)) (1) 


Example 9. As in Example 8, let a set of disjoint events be 

E = {E,...,Em} 
and let p(E;) = +,i=1,...,m. Also, let the groups of events Gi,...,Gn be 
defined the same way therein. Let nx, be the number of events in G;. Then 
ne = je — je-1 and p(Gy) = 2, for k = 1,...,n, and also p(Ei|Gx) = 2, 
for jp-1 <4 < jp. Then Equation 11 yields, 


n 
h(m) = H(p(G1),---,P(Gn)) + 5> p(Ga)h(na) 
i=1 
And since from Theorem 6, h(m) = Alog..m, we have, 


H(p(G1),.--, p(Gn)) = - em p(Gi)(h(ni) — h(m)) 
=-) PG) (Alog=) 


=— (32n(6oae0(6. (12) 


Example 10. From h(m) = Alog.m, if we let » = log,c, then h(m) = 
log, m. In other words, the scale factor 4 can be absorbed in the base of the 
logarithm. 


Theorem 7. Let {pi,...,Pm} be a set of probabilities such that 7y", pi = 
1. Then, + 


H(p1,---,Pm) =— > pilogp; (13) 
i=l 


Proof. This is the results from Example’s 8 and 9, and the scale factor A 
disappears in a manner similar to that shown by Example 10. q 


+ Some times the entropy function is defined instead by H(p1,..-,pm) = 
Viz Pilog 3, but this is obviously the same as our Equation 13 since log x~* 


= —loga. 
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Example 11. If the base of the logarithm in Equation 13 is 2, the unit of 
the entropy is bit. On the other hand if this base is e, that is to say, if we 
use natural logarithms, then the uncertainty has the unit of nat. From this, 
one may see that one nat is equal to log, e bits, which is approximately 1.443 
bits. The term bit comes from binary digit, the term nat from natural digit. 


Definition 21 explains what is meant by conditional entropy. Starting 
from Equation 14, which is an equation for conditional entropy when y is 
given, we obtain the overall conditional entropy in Theorem 8. For any pair 
of sets X and Y given, H(X|Y) gives the amount of uncertainty remaining 
about X after Y has been observed. 


Definition 21. The conditional entropy of X, given some y € Y, is, 
H(X\y) = — 5 p(aly) log p(aly) (14) 


Then the conditional entropy H(X|Y) is the expectation, or average value, of 
H(X|y) over the range Y. In other words, 


H(X|Y) = )0 p(y) H(Xly) (15) 


Theorem 8. The conditional entropy is, 


H(X|Y) = — S— p(a,y) log p(ely) 


yy 


Proof. Putting the equation of conditional entropy when y is given, Equation 
14, into the overall conditional entropy equation, Equation 15, we get, 


H(X|Y) = 5> p(y) H(Xly) 


=-S py) ¥> p(ely) log p(aly) 


Then from Equation 1 of Definition 14, p(y)p(z|y) = p(z,y), and so, 


H(X|Y) = — © p(2,y) log p(aly) 


Ly 


q 
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Theorem 9. Let X, Y and Z be discrete random variables. For each z € Z, 
let E(z) = D0, P(y)p(z|z,y). Then, 
H(X|Y) < H(Z) + E(log E) 


Proof. 
H(X|Y) = —E log p(2ly)] 
as p> p(2, y, z) log p(x|y) 
-- De De” rey) fo: log p(aly) 
Because 


p(z,y, 2) 
p(z) 
is a probability distribution, that is a convex cap function, we may apply 
Equation 6, namely Jensen’s inequality for convex cap, from Definition 19. 
Hence, 


= p(x, y|2) 


H(X|Y) < Lr log ra > mete) ) 


= = 28) ee a dPe) log )o PCH, 2) 


2. Deel) 
But, 
p(z,y,2) _ P(v,y,2)py) 
SS Se ely 
peel) ply) PUP) 
hence the statement above is proved. q 


Corollary 9[1]. Let X and Y be random variables each of which takes 
values in the set {21,...,@,}. Let P, = P(X # Y). Then, 


H(X|Y) < H(P,.) + P, log(r — 1) 
Proof. From Theorem 9, let Z=0 if X = Y, and let Z=1 if X # Y. Then 


E(0) = 1 and E(1) =r—-1. q 
Theorem 10. The maximum uncertainty occurs when the events are 
equiprobable. 


Proof. Since, 


1 1 “ 
H (=. vey =) — H(p1,.--, Pm) = log,m + S- pilogy pi 
i=1 
m 


= log, e So pi In mp; 
i=1 


>1 =0 
weEn a) 
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it being the case that In+ > 1-2. Therefore H(pi,.. 


when p; = 4, for alli=1,...,m. 


Kit Tyabandha, PhD 


-;Pm) is maximised 


q 


Example 12. Figure 3 shows that Inxz < 2 —1, while Figure 4 shows that 
such inequality does not exist when the logarithm in question is of base 10. 


Figure 3 Plots of nz and x —1, which show that na <x -—-1. 


Inv <ax-1 
2 T 


Ing and z-1 
1 


1 
i 
T 


-3b 


-4 


Figure 4 Graphs of y = logx andy = x —1, which show that the latter 


is no bound for the values of the former. 


logyg a and # —1 
2 T 


1.57 


4b 


0.5 


logjy x and x — 1 
° 


25 3 


Example 13. Figure 5 confirms for us how In + > 1-2, whereas Figure 6 


tells us that this is the case for log 4, 


Figure 5 Plots showing In 4 and 1—2x, which show that In+ >1-x2. 
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Ini<l-« 


Int and1l-a« 


Figure 6 Graphs showing y = log+ andy = 1-2, from which it is clear 
the latter gives no bounds for the former. 


logy) 4 and 1—a 


2 


1.5 


logj9 + and 1—a 


-0.5- 


Example 14. Consider two events with probabilities p and 1— p. The 
entropy function is then, 


H(p,1—p) = —plogp — (1 — p) log(1 — p) 


Whenever the occurrence of either event become certainty, the entropy func- 
tion would become zero. Mathematically we see that lim,_.9 plogp = 0 and 
limp_,1 plog p = 0. Figure 7 shows a plot of the values of the entropy function 
for two events. Base-2 logarithm is used here. 


Figure 7 The entropy function of two events with probabilities p and 
1l—p. 
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H(p,1 —p) = —logy p — (1 ~ p) loga(1 — p) 


Definition 22. The mutual information is I(X;Y) = H(X) — H(X|Y). It 
represents the information provided about X by Y. 


§ 


Example 15. Alternatively, the mutual information may take the following 
form, cf Definition 14, 


es eleanensral) 


_S ple.) log POO 
= 2 PW) 08 ep) 


= J pla,y) log PH) 


y 
p(y) 


That is to say, I(X;Y) is the average taken over the X, Y sample space of the 
random variable I(x; y) such that, 


Bice DIG! p(z,y) _,  plyla) 
Kaw) = 1085) = "Saya ~ 8 


Theorem 11. For any discrete random variables X and Y, I(X;Y) > 0. 
Moreover, I(X;Y) = 0 if and only if X and Y are independent. 


Proof. From one of our formulae for the mutual information and from 


18 14 January, 2007 God’s Ayudhya’s Defence 


Kit Tyabandha, PhD Coding Theory, notes and projections from lecture 


Jensen’s inequality, 


1(%;¥) = — Slog POP) p()p(y) 


P(e y) 
> log }> p(z)p(y) = log 1 = 0 
zy 


Furthermore, the equality sign holds if and only if p(#)p(y) = p(a, y) for all 
x and y, that is to say, when X and Y are independent of each other. q 


Example 16. From our formulae of the mutual information, we may see 
that, 
I(X; Y) = 1(Y;X) 
and 
I(X; Y) = H(Y) — H(Y|X) 
Also, 


LOGS) = 2 ple) aay a D 


Definition 23. Let X, Y andZ Pe three random variables. Then the mutual 
information I(X, Y;Z) is given by, 


I(X,Y;Z) =E (108 pee = ¥ p(a,y, 2) log ae 


This mutual information is the amount of information X and Y provide about 
Z. 
8 


Theorem 12. Let X, Y and Z be three random variables. Then we have 
I(X, Y;Z) > I(Y; Z), where the equality holds if and only if p(z|z, y) = p(zly) 
for all (x,y, z) such that p(z,y,z) > 0. 


Proof. 


I(Y;Z) —1(X, Y;Z) =E (108 mee — log moe wu) 


= 7 plesy,2) log PEL 


Fo pGlet.9) 


Then using Jensen’s inequality, we have, 


I(Y;2) — U(X, ¥;Z) <log Yo p(a,y, 2) Pe 


a Peele, u) 
=log S> p(z,y)p(zly) = log 1 = 0 
LyjY,z 
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Theorem 13. Let (X,Y, Z) be a Markoy chain. Then, 


1x2) < {TE 


Proof. From Theorem 12, I(X;Z) < I(X,Y;Z). Because (X,Y,Z) is a 
Markov chain, I(X, Y;Z) = I(Y; Z). Therefore I(X; Z) < 1(Y;Z). Next, since 
(X, Y, Z) is a Markov chain, (Z, Y, X) is also a Markov chain. Hence I(X; Z) < 
1(X; Y). q 
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Group, field and finite field 
11%” November 2005 


Definition 24. A group is a non-empty set G together with an operation, 
called multiplication, which associates with each ordered pair x, y of elements 
in G a third element, their product, in G such that, 

1. multiplication is associative; 

2. there exists an identity element e in G; and 

3. for each element x in G there exists an inverse of zx. 


In other words, for x and y in G there exists xy in G such that, 

1. for any x, y and z in G, x(yz) = (xy)z; 

2. there exists e in G such that re = ex = x; and 

3. to each x in G there corresponds x~! in G such that az! = a—-!z =e. 
A group is called Abelian or commutative group if xy = yx for all elements 
xz and y in G. The group G is called a finite group if it consists of a finite 
number of elements, otherwise it is called an infinite group. This number of 
elements of G is called its order. 


§ 


Theorem 14. Both the identity e and the inverse x! of a group G are 
unique. 


Proof. Suppose e°0 is another element in G such that xe°0 = e902 = x 
for every x in G, then e°0 = e%0e = e, hence the identity element is unique. 
Suppose for every x in G, that x°0 be another element in G such that x7°0 = 
x°0a = e, then, 


2°0 = £°0e = 2°0(a22~') = (2°0x)2- 1 =ex 1 =a7! 


hence the inverse element of G is unique. q 


Definition 25. A ring is an additive Abelian group R which is closed under 
a second operation, called multiplication, in such a manner that, 

1. multiplication is associative; and 

2. multiplication is distributive. 


That is to say, if z, y and z are any three elements in R, then, 

1. x(yz) = (xy)z; and 

2. e(y+z) =axy+az and (a+y)z = 22+ yz. 
A ring is called a commutative ring if xy = yx for all elements x and y in R. 
If a ring R has a non-zero element 1 with such a property that 21 = 1lx# = 2 
for every x, then 1 is called an identity element, and R is said to be a ring 
with identity. 

8 
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Definition 26. Let x be an element of R, a ring with identity. Then 
x is said to be regular if its inverse x~! exists, otherwise it is said to be 
singular. Regular elements are also called invertible- or non-singular elements. 
Furthermore, F is called a division ring if all its non-zero elements are regular. 


§ 


Definition 27. A field is a commutative division ring. 


§ 


Example 17. A field, then, is a non-empty set F together with two oper- 
ations on its elements, namely addition and multiplication, such that for all 
a, b and cin F’,, under addition, F' is closed, commutative, associative, has a 
unique identity, has for each of its elements a unique inverse; and under mul- 
tiplication, F' is closed, commutative, associative, has a unique identity, has 
for each of its elements a unique inverse. Furturemore, F is also distributive. 


These properties of field are inherited from the latter’s progenitors, since the 
field is defined by the division ring which itself is defined by the ring which 
itself is defined by the group. Table 1 shows the sources from which each of 
the properties of the field is defined. 


operator property defining definition 
addition closed group 
commutative Abelian group 
associative group 
identity group 
inverse group 
multiplication closed ring 
commutative commutative ring 
associative ring 
identity ring with identity 
inverse division ring 


Table 1 The various sources at the places of which the various properties 
of the field are defined. 


Theorem 15. Consider any two elements a and b in a field F’, we have 
(-1)-a=-a. 
Proof. Since, 


(-1)-a+a= (-1)-a+a-1=((-1)+1)-a=0-a=0 


and since a + (—a) = 0, therefore (—1)-a = —a. q 


Theorem 16. Let a and b be any two elements in a field F. Then ab = 0 
implies a =0 or b=0. 


Proof. If a+ 0, then, 
0=a'-0=a (ab) = (a ta)b=1-b=b-1=b 
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And since a and 0 are arbitrary, and since ab = ba, our statement above is 
proved. q 


Definition 28. Let a, b and m be integers, and let m > 1. Then a is said 
to be congruent to b modulo m, in other words, 


a = b(modm) 


if m|(a — b), that is to say, m divides a — b. The number m is called the 
modulus, and 6 is called the residue of a(modm). Sometimes b is also called 
the principal remainder of a divided by m, and denoted by (a(modm)). A 
residue is said to be common if 0 <b<™m. 


§ 


Theorem 17. Any integer a is congruent to exactly one of 0,1, ...,m-—-1 
modulo m. 


Proof. Let a and m be integers, and let m > 1. Then there exists a unique 
k such that a = mk + b, where 0 < b < m-—1. Therefore b is uniquely 
determined by m and a. 

To prove that b is unique, suppose there exist a = mk, +b; and a = mkp + be, 
where 0 < Bb} < m—1 and 0 < bg < m-—1z, such that b; 4 bo. Then, 
a—mk, #4 a—mpz2, and since m > 1, therefore kj # ko. Since k; and kz are 
arbitrary, let ky > ke and let ky = ko +n. Then, 


mkz + bo = a= m(ke +n) +b) = mky + bi + mn 


and since b} > 0, m > 0 and n > 0, we have bg > m, which contradicts what 
we have said earlier, that is b) <m-— 1. So, necessarily by = by. q 


Theorem 18. Let a, b and m are integers, and let m > 1. Then the 
following properties hold for congruence. 

. @= b(mod 0) implies a = b 

. either a = b(modm) or a  b(modm) 

. @=a(modm) 

. a= b(modm) implies b = a(modm) 

. if a = b(modm) and b = c(modm), then a = c(modm) 


Let a = b(modm) and c= d(modm). Then, 
f. a+c=b+d(modm) 
g. a—c=b-—d(modm) 
h. ac = bd(mod m) 
Further, let k and n be integers. Then, 
i. if a= b(modm), then ka = kb(modm) 
j. if a = b(modm), then a” = b"(mod m) 
k. if a = b(modm,) and a = b(mod my), then, 


onao7e 


a = b(modlcem(m1, m2)) 
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where lcem(z,y) is the least common multiple of x and y, that is the 
smallest z such that there exist positive integers p and q by which px = 
gyrase 

1. if a* = b*(modm), then, 


se aenicoem 
a=b (moa =e ) 


From above properties, it follows that, 
m. if a = b(modm), then P(a) = P(b)(modm), where P(x) is a polynomial. 
Properties (a) is called equivalence, (b) determination, (c) reflexive, (d) sym- 
metry, and (e) transition. 

8 


Definition 29. We denote by Z,,, or Z/(m) the set {0,...,m— 1}, where 
m > 1, and define the addition and multiplication on it as, 


a@®b= (a+ b(modm)) 


and 
a © b = (ab(modm)) 


respectively, and these may be denoted as a+ b and respectively ab for sim- 
plicity. 
8 


Example 18. The set Z together with addition and multiplication intro- 
duced in Definition 29 form a ring. 


Theorem 19. The ring Z,, is a field if and only if m is prime. 


Proof. First we prove that m being prime implies that Z,, is a field. Let m 
bea prime. Then any a # 0 in Z,,, in other words 0 < a < m, is prime relative 
to m. Therefore, there exist two integers u and v, where 0 < u< m-—1, such 
that ua + vm = 1, which means that ua = 1(modm). Hence u = a7!, and 
since this applies for every a in Z,,, it follows that Z,, is a field. 

Next we will prove that if m is not a prime, then Z,, is no field. Suppose 
that m is not a prime. Then m = ab for some a and b, where 1 < a<m 
and 1<b<™m. But ab =0 is in Z,,, and therefore a = 0 and b = 0. This 
contradicts the values of a and b given above, thus Z,, is no field. q 


Definition 30. We denote by na the element )°/_, a for any element a in 
a ring R and an integer n > 1. 


§ 
Definition 31. Let F be a field. Then the characteristic of F is the least 
positive integer p such that p-1 = 0, where 1 is the multiplicative identity of 
F. Where no such p exists, this characteristic is defined to be zero. 
By F* we mean F'\{0}. 

§ 
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Theorem 20. The characteristics of a field is either zero or a prime number. 


Proof. Consider a field F. Since 1-1 = 1 ¥ 0, therefore 1 is not the 
characteristic of F. Let the characteristic be p = mn, where 1 < n < p and 
l<m<p. Ifa=m-1and b=n-1, then, 


o-b=(m-a(n-1)= (Soa) yi mn-1=p-1=0 


This implies a = 0 and b = 0, which contradicts what we had assumed when 
we started. q 


Definition 32. Let & and F be two fields, and let F be a subset of E. 
Then F is called a subfield of E if the addition and multiplication of E, when 
restricted to F’, are the same as those of F’. 


§ 


Theorem 21. A finite field F of characteristic p contains p” elements for 
some integer n > 1. 


Proof. Choose an element a; from F*. Then 0-aj,...,(p —1)- ay are 
pairwisely distinct from one another, for if i-a, = j-a1 for some0 <i<j< 
p—1, then (j —i)- a, =0. Since p is the characteristic of F', by Theorem 20 
p can be either zero or prime. And since 0 < j-—i < p—1, therefore j —i = 0, 
that isi =. 

Next, if F\{0-a1,...,(p—1)- az} is not empty we choose from it az. Then 
a,Q1 +a2Q2 are pairwise distinct for all 0 < a,,a2 < p—1, for if aja, + aga2 
for some 0 < aj, a2, 6,62 < p—1, then necessarily aj = bz because otherwise, 


a —by 


a2 a1 


be — az 


which contradicts the way we have chosen a2. Then it follows that (a1, a2) = 
(by, bz). 

Since F is finite, we may continue in this fashion to a3, a4, and so on until 
OQ, for some integer n, and find a;, for all 2< j <n, from P\D SS ajay}, 
where a;,i =1,...,j —1, are in Zp. 


In the end, F = {>;._, aja;}, where a1,...,a, are in Z,. In the same manner 
as above, we may show that aja, +...+ GQ, are pairwisely distinct from 
each other for all a; in Z,, where i =1,...,n. Therefore |F'| = p”. q 


Definition 33. Let F bea field. Then the set, 


Fila] = {Soae't 
i=0 


where a; is an element in F and n > 0, is called the polynomial ring over 
F. An element of F[z] is called a polynomial over F. For a polynomial 
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f(x) = iL az’, providing that a, # 0, the integer n is called the degree of 
f(x), denoted by deg( f(x )). We define deg(0 ) = -oo. A nonzero polynomial 
f(x) of degree n is said to be monic if a, = 1. Furthermore, a polynomial 
f(x) is said to be reducible over F if there exist two polynomials g(x) and 
h(x) over F such that deg(g(x)) < deg(f(x)) and deg(h(x)) < deg(f(x)), and 
: a) = a )h(z). A polynomial is said to be irreducible over F if it is not 


§ 


x) in F[z] be a polynomial of degree n > 1. Then, 
in Fx] there exists a unique pair (s(x),r(a)) of 
polynomials, where deg(r(x)) < deg(f(x)) or r(x) = 0, such that g(x) = 
s(x) f(x) +r(x). Here r(x) is called the principal remainder of g(x) divided 

by f(x), or in our notation (g(z)(mod f(x))). 
§ 


Definition 35. Let f(x) and g(x) in F[z] be two nonzero polynomials. 
The greatest common divisor of f(x) and g(x), written gcd(f (x), g(z)), is the 
monic polynomial of the highest degree which is a divisor of both f(x) and 
g(x). Two polynomials f(z) and g(z) are said to be co-prime, or prime, to 
each other if gcd(f(z), g(x)) = 1. The least common multiple of f(a) and 
g(x), namely lem(f (a), g(x)), is the monic polynomial of the lowest degree 
which is a multiple of both f(x) and g(z). 

§ 


Example 19. Let the factorisations of two polynomials f(a) and g(x) are, 


f(a) = a> (pi(@))* +++ (n(x) 


Definition 34. Let f 
for any polynomial g(x 


Nae 


and 
g(x) = b- (pr(x))® +++ (Pa(a))™ 


where a and bare in F, and e;, d; > 0, and p;(x) are distinct monic irreducible 
polynomials, then, 


gcd( f(x), 9(x)) = (pr (a) ™*Or®) «(py (a))mner) 


and 
Iem(f (x), 9(x)) = (pi (x) )me*(e1 041) ise (p(x) )me*(en sn) 


Example 20. Let f(x) and g(z) in F'[2] be two nonzero polynomials. Then, 
there exist two polynomials u(x) and u(z) having deg(u(z)) < deg(g(x)) and 
deg(u(x)) < deg(f(x)) such that, 


gcd( f(x), g(@)) = ula) f(a) + v(x) 9(w) 


Then, 
gcd(f(x)h(x), g(@)) = gcd(f (x), g(x)) 
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if gcd(h(x), g(x)) = 1. 
Theorem 22. Let f(z) be a polynomial of degree n over a field F', where 
n> 1. Then F[z]/(f(x)), together with the addition, 


g(x) ® h(x) = (g(a) + h(x)(mod f(x))) 
also written g(x) + h(a), and multiplication, 
g(@) © h(x) = (g(x)h(a)(mod f(«))) 


also written g(x) - h(x), form a ring. Furthermore, F'[x]/(f(x)) is a field if 
and only if f(x) is irreducible. 
§ 


Exercise 9. Prove Theorem 22. 


§ 


Example 21. Consider the ring Zs[z]/(1+2”) = {0,1,2,1+2} Its addition 
and multiplication tables are shown in Table 2. 


+ 0 1 x (1+ 2) 
0 0 1 x 14+2 
1 1 0 l+gc« 
x L l+az 0 1 
l+x lt+a2 ¢«@ 1 0 
x 0 oil x l+a 
0 0 0 0 0 
1 0 oil x l+a@ 
L 0 « 1 l+2z 


l+ax 0 1+2 1+2z 0 
Table 2 Addition and multiplication tables for Z2[z]/(1 + x7). 


Example 22. Consider the ring Ze[z]/(1 + x + x7). Its addition and 
multiplication tables are given in Table 3. 


+ 0 1 x l+z2z 
0 0 1 x l+z2z 
1 1 0 l+xz 2 
x xz l+ax 0 1 
lt+x ltx «zz 1 0 

x 0 1 x l+2z 
0 0 0 0 0 

1 0 oil x l+a 
x 0 « l+az 1 
l+er 0 1+2 1 £ 


Table 3 Addition and multiplication tables for Z2[z|/(1 +2 +27). 
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Example 23. Table 4 shows the analogies between Z and F[z]. 
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Theorem 23. For every element ¢ of a finite field F with n elements, 
go" = ¢. 
Proof. The case when ¢ = 0 is trivial. Next, if ¢ # 0, then we could list 


all the nonzero elements of F as F* = {¢1,...,¢n—1}. And since F is closed, 
we could multiply each element in F* to obtain F* = {$¢1,...,¢¢n—1}. 
Therefore ¢1 ---¢n—1 = ($¢1)--- (d¢n_1), which leads to ¢”~! = 1. q 
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Corollary 23[{1]. Let F be a subfield of E, and let |F| = n. Then an 
element ¢ of F is also in F if and only if @” = ¢. 


Proof. The if part was already proved in Theorem 23. For the only if part, 
if ¢ satisfy ¢" = ¢, then it is a root of x” — a. And since |F| = n means that 
all the elements of F' are roots of x” — g, it follows that ¢ lies in F. q 


Definition 36. We denote a finite field with q elements by F, or GF(q). 
Let @ be a root of an irreducible polynomial f(x) of degree n over a field 
F. Then, if we replace x in F[z]/(f(x)) by a, the field F[z]/(f(x)) can be 


represented as, 
n-1 : 
Fla] = p aa} 
4=0 


for a; in F. 

8 
Definition 37. An element a in a finite field F, is called a primitive element, 
or generator, of Fy if Fy = {0,a,a?, Bs es eas 

8 


Definition 38. The order, ord(a), of a nonzero element a in Fy, is the 
smallest positive integer k such that a* = 1. 


§ 
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Exercises for Group, field and finite field 
14%” January, 2007 


15. Write the addition and multiplication tables for GF'(5) = {0, 1, 2,3, 4} 
16. Find the principal remainder when 83 - 54 is divided by 7. 
17. Determine whether (3'8) (13°) + 1 is divisible by 17 
18. Prove that 1+2+2? and 1+27+<2° are the only irreducible polynomials 
of degree 3 over Fy. 
19. Is GF(4) a subfield of GF'(8)? Explain. 
20. Construct the addition and multiplication tables for the rings Zs. 
21. Find the multiplicative inverse of 3, 6, 10 in Z1. 
22. Show that the polynomials 1+ x? and 2+ 22 +2? over F3 are irreducible. 
23. Factorise the polynomials 2” — 1 over F3, 22° — 1 over F7 and «U4 — 1 
over Fs. 
24. Determine the number of primitive elements in the fields F1o, Fi, and 
F309. 
25. Find the number of monic irreducible cubic polynomials over F,. 
26. Find all the cyclotomic cosets of 2 modulo 33. 
27. Let 

f(x) = (24 22”) (14+ 2? + 2°)’ (-1 42°) 


in F3[2] and 
g(x) = (1+ 2?) (—2+ 22) (1+ 27 +2°) 
in F3[x]. Find gcd (f(x), g(x)) and lem (f(x) , g(«)). 


28. Find two polynomials u(z) and v(z) in F2[x] such that deg (u(x)) < 5, 
deg (u(x)) < 4 and 


u(x) (l+a+2°)+v(2)(l+ae+e2?+23 +21) =1 
29. Construct the addition and multiplication tables for the ring 
F3[z]/ (2? +1) 
30. Determine all the subfields in Fos. 
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Bounds in coding 
18** November 2005 


Definition 39. A prime power is a prime or an integer power of a prime. 
8 


Example 24. Examples of prime powers are, 
2,3,4,5, 7,8, 9,11, 13,16,17,... 


Definition 40. Let the alphabet be Fy, in other words a Galois field GF (q), 
where q is a prime power, and let the vector space V(n,q) be (F,)". Then a 
linear code over GF'(q), for some positive integer n, is a subspace of V(n, q). 


§ 


Theorem 24. A subset C' of V(n,q) is a linear code if and only if, 
a. u+v€C for all u and v inC 
b. au € C for all ue C and a€ GF(q) 


Proof. The proof follows from Definition 40 since, if C' is a field, it must be 
closed under addition and multiplication. q 


Example 25. A binary code is linear if and only if the sum of any two code 
words is a code word. 


Definition 41. A vector space V is a set which is closed under finite vector 
addition and scalar multiplication. If the scalars are members of a field F, 
then V is called a vector space under F’. Furthermore, V is a vector space 
under F if and only if for all members of V and F the following properties 
hold under addition, 

a. commutativity 

b. associativity 

c. existence of an identity 

d. existence of an inverse 


while under multiplication the following, 
e. associativity under scalar multiplication 
f. distributivity of scalar sum 
g. distributivity of vector sum 
h. existence of a scalar multiplication identity 
In other words, for all x, y and z in V and all p and q in F, 


a x+ty=yt+x 
b. (x+y) +z2=x+(y +z) 
c. 0+x=x+0=x 

d. x+(—x) =0 

e. r(sx) = (rs)x 

f. (r+s)x =rx+ sx 

g. r(xt+y)=rxt+ry 

h. lIx=x 
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Example 26. Let q be a prime power, and let GF'(q) denote a finite field 
over g elements. Then, by vector space over finite field we mean a set GF(q)” 
of all ordered n-tuples over GF'(q), which is closed under finite vector addition 
and multiplication, that is to say, multiplication by some scalar a in GF(q). 


Theorem 25. A non-empty subset C' of V(n,q) is a subspace if and only 
if C' is closed under addition and scalar multiplication. In other words 


Proof. What Theorem 25 states amounts to saying that a non-empty C' in 
V(n,q) is a subspace if and only if, 

a. x,y €C impliesx+yeC 

b. ifa € GF(q) and x € C, then ax EC 
All properties to be met in Definition 41 are the same for C' as for V(n,q) 
itself, provided that C' is closed under addition and scalar multiplication. 
Therefore statements (a) and (b) are necessary for C to be a subspace. They 
are also sufficient since C is already a subset of V(n,q). q 


Definition 42. <A linear combination of r vectors vi,...,Vr in V(n,q) is 
any vector of the form yt ajv;, where a; are scalars. Let A be a set of 
vectors {vi,...,v,}. Then A is said to be linearly dependent if there exist 
scalars a,,...,a, not all of which are zero, such that a aiv; = 0. And 
A is linearly independent if it is not linearly dependent, that is to say, if 
1 Gi = 0 implies a; are all zero for i = 1,...,r. 


§ 


Definition 43. Let C be a subspace of a vector space V(n, q) over GF (gq). 
Then a subset {vi,...,v,} of C is called a generating- or spanning set of C if 
every vector in C can be expressed as a linear combination of v,,...,v,. A 
basis of C' is a generating set of the same which is also linearly independent. 


§ 


Definition 44. For a q-ary (n,m, d)-code C, the relative minimum distance 
of C' is defined to be wa 
6c) = 2—= 


§ 


Definition 45. Let a code alphabet A be of size q > 1, n the size of each 
word, d the minimum distance, and A,(n,d) the largest possible vocabulary 
size m such that there exists an (n,m, d)-code over A. Then any (n,m, d)- 
code C which has m = A,(n,d) is called an optimal code. The main coding 
theory problem is precisely to find the value of A,(n,d). 

8 


Definition 46. Consider each word as an n-tuple. Then all such tuples 
lying within Hamming distance r of an n-tuple x are said to be within a 
Hamming sphere of radius r around x. 


§ 
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Theorem 26. Let the size of the alphabet be g = |Al, the size of a word 
be n, and the Hamming- or minimum distance be d. Then the Hamming- or 
sphere-packing bound on the size m of a code dictionary C is given by, 


1) 


m< q 


"EE a-0 (7) 


d-1 
97 — 
r*0= | 5 | 


Proof. Let c be a code word. Let e(z,y) be the number of places which are 
different between two words x and y. Since there are g—1 possibilities for each 
differing position between any two words, there are (q — 1)* possible errors 
when i places are different. And to position these 4 places there are altogether 


where 


ways. Therefore the number of all words w; such that e(w;,c) < r is 


the number n, of n-tuples in a Hamming sphere of radius r around c, and is, 


Np = Ya-v) a 


Then the lower bound for our code is d(C’) > 2r, that is to say, d(C) > 2r+1. 
In other words, Hamming spheres of radius r around the m code words of C 
are mutually nonintersecting. There are a total of g” possible n-tuples, that 
is words of length n, not all of which are code words. In other words, m < q”. 
And since there are n, of these n-tuples within each sphere, the the number 
of the all the n-tuples contained within the space of all these n-tuples over 
the alphabet A is n,m. Hence, 


m1) (7) <q” 


and thus this theorem. q 


Definition 47. Codes which satisfies the Hamming bound are called [per- 
fect. codes]. 


§ 


Problem 1. Let r and n be integers such that 0 <r < 4, then prove that, 


n(— me 3 nH(= 1-7) : i nH(21-2 
[se (5) Q-Z)] * sy (it) s2 


where H(z,y) is the entropy function the arguments x and y of which are 
probabilities and H(-,-) has the unit of bits per symbol. (Hint: Stirling’s 
approximation to n!, cf MacWilliams and Sloane, 1977) 
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§ 


Note 1. Let C(n,d) be acode with words of length n and minimum distance 
between words d. Let my,q be the number of code words in C(n, d). Then the 
size of the largest dictionary of n-tuples with fractional minimum distance dy 
is, 


Mm(n, df) = |C(n, d)| 


max 
{C(n,d):(2)>4;} 


§ 


Problem 2. From Note 1, show that for n fixed, mm(n, dy) is a monotonous 
nonincreasing function of dg. Then show that with dy fixed, mm(n,dy) in- 
creases exponentially with n. 


8 
Definition 48. The asymptotic transmission rate is defined to be, 
R= te Seta A 
(dy) = Jim ~ log mm(n, dy 
Also defined are the upper- and the lower bounds on this rate, 
5 : 1 
R(dyz) = limsup — log mm(n, dy) 
nooo 1 
and 1 
R(dy) = lim inf A log mm(n, dp) 
8 
Note 2. For large n, show that R (dy) < R (dy) < R (ds). 
8 


Example 27. Using the results from Problem 1 we obtain the Hamming 
bound for the binary code, 


ms (Oo()(-Z))remerey a 


(= 


Equation 16 must hold for all binary dictionaries, therefore it gives an upper 
bound on the maximum dictionary size mm(n, dy) over all dictionaries whose 
word length is n and fractional distance, 


1 
d ar+ {a} 
dz, = — = ———* 


n n 


where 
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where the choice of 1 or 2 depends on whether d is odd or respectively even. 
For large n, 


i d d 
5 1-H (1-4 
rantdy) < (on(S) (a) ont-#(-#)) 
The upper bound for the attainable information rate is, 
R(ds) 


1 
= lim sup — log, ™Mm (n, ds) 
noo «7 


F llog.n 1 Ody dy dy dy 
< jim {582 + 5 lowe (“FE OS) eae eka 


As n approaches infinity, 


B ds ,_ of 
<1-H(—,1-—= 
R(dj) < ( iS ) 
Problem 3. Work out the details of derivation of Example 27. 
§ 


Theorem 27. Let d(c;,cj) be the Hamming distance between the code 
words ¢; and c;. Let d(C) be the minimum distance between code words, and 
d the average distance between words. If, 


d —1 
arg a 
n q 
then the Plotkin’s bound, 
a 
™Mn,d < qd a 
nq” 


Proof. The average distance gives an upper bound for the minimum dis- 
tance, that is d < d, where 


pea re | 
7 c=) basic 


i=2 j=l 


Qi 


Since the Plotkin’s bound is an upper bound on d, we need to maximise, 


n 


yD ¢a.4)= >. yy tangy) 


i>Jj ty k= 


= SOE Dd 4 ie, ejn) 


k=1 i> 
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This implies (cf Plotkin, 1960), 
n 
Dede) SD) max 9 DD d (cin, ci) 
i>j painted SG 


which says that the upper bound is maximised by choosing a maximising cj, 
from the alphabet A. However this is, 


max SS  d(cik,cjk) < (2) age) 


Cig t=1,....m —A 
w>j 


Providing that, 


then 


q 
Note 3. Notice how, 
m-1l m m %t-1 
()= (-) 
t=1 gail 1=2 j=l 
Equivalently to this are, 
DDL) and. DDO) 
i<j i>j 
8 
Problem 4. Prove Note 3 on double summations. 
8 
Note 4. If, 
q-1 
ds > —— 
y qd 
then, 
d 
Mm(n, dz) < f 
q-1 
ae) 
and then, 


= 1 
R(dz) = limsup —logm,,(n, dy) = 0 
n>o0 7 
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On the other hand if, 


then from, 


m(n,d) = ys M,(n, d) 


acA 


where m(n, d) = |C(n, d)|, C(n, d) being any code consisting of n-tuples whose 
minimum distance is at least d, and m,(n, d) = |C,(n, d)|, Cz(n, d) comprising 


all n-tuples in C(n, d) which begin with the symbol x. Hence, 


m(n,d) < gmz(n,d) 
= qm(n — 1,d) 


= q"~*m(k, d) 


Provided & is small enough, we may yet use the Plotkin’s bound, hence 


n—-k (d 
m(n,d) < — ( i. 
oe ( q ) 
when 
coe a 
k q 
Choose & the largest integer satisfying 
d 1 q-1 
kk oqk q 
Then, 
k+r= ie 
q-1 


gh (ar) +14 


Finally, 
qd—1 


m(n,d) <q" (4 )4 


and, if dy is fixed and n become large, 
R(ds) < logq(1—-—*ia; 
Ss oi 
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8 
Problem 5. Prove that, 


q (q-1)r+1)* <1 


forO<r<l 


§ 


Proposition 1. Let C be a code containing binary n-tuples, m g(x) the 
number of code words within distance d of an n-tuple x. Further, let A be a 
new code whose code words are the difference vectors a1,...,@m, such that 
a; = G,0x,i=1,...,ma, where © denotes modulo subtraction of the vectors, 
element by element. Assume that d < } and both d and m are large enough 
such that mg(z) > 2. Then, 


de 2 2d (1- “) Ma (17) 


where 


i=0 


n-tuples within distance d of each code word. This gives the total of 


n&(1) 


n-tuples in the Hamming sphere around the m code words. 

There are mq(x) code words within the distance d of any n-tuple x. For x 
in X", cin C and d(az,c) < d, the number of pairs (#,c) can be counted by 
picking up first « and then c, hence 


Y mate) =m (") 


rex” 


Since X™ contains 2” of n-tuples, consequently there exists some value of x 


such that, 
d 
a4 n 
ma(x) > E “> (")] 
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Let c1,.-.,Cm, be code words in C' that lie within Hamming distance d of the 
n-tuple z. Consider the difference vector a1,...,@m, such that a; = ¢; Oz. 
Then A is a set of localised code words of C’. Then, 


a, 0a; =(G Ox) O(G Ot) =GOGg 
and we have, 
d(c;, cj) = d(a;, a;) 
Thus, 
ne a 

Me > Ma(x) > [emmy & )| 
Also, dg > d- and w(a;) < d for all n-tuple a; in A, where the Hamming 
weight w(a;) is the number of nonzero elements in aj. 


Next, applying the average-distance Plotkin bound to the localised code A 
one obtains, 


d. < dy <d, = (“Ate —2)) Sy ww) (18) 


2 aie 

t>Jj 
We maximise RHS of Equation 18 to get rid of the dependence on A. We 
enlarge our restriction on w(a;) above to the set of all possible a; in A, thus, 


e w (aj) < mad (19) 
aicA 


Then, let z;, be the number of code words in A having a 0 in the k*® position. 


We maximise, 
n 


S> So d(ai,a;) = S> (ma — z*) (20) 
i>j k=1 
subject to the constraint of Equation 19 that, 


n 


S- (Ma — Zr) < Mad (21) 
k=1 
By setting, 
Mad 
= 22 
a, (22) 
we maximise RHS of Equation 20 under the constraint in Equation 21. From 
Equation’s 18, 20 and 22 we obtain Equation 17. q 
Algorithm 2 Gilbert bound, a lower bound to m for n, d and q. 
Sr x” 


for all c; in S” do 
for all n-tuples c; within d—1 distance of C do 
remove C; 
endfor 
endfor 
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Note 5. For the Gilbert bound algorithm, Algorithm 2, initially ||" = 
|X|". For each c; chosen, at most 


d-1 23 
ye '(7) 
n-tuples are removed. If 


d—-1 


m-)Ya-'(4) <a 


1=0 


then the algorithm will not stop after m — 1 code-word selections. 
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Group-, polynomial-, and Hamming codes 
25¢ November 2005 


A recapitulation of group, ring, field and finite field is given in Definition 
49 and Example 28. 


Definition 49. A non-empty set G with a binary composition is called 
a group if the composition is associative, if a unique identity exists for all 
elements in G, and if a unique inverse exists for each of the elements in G. 
The group G is called Abelian if the composition in it is commutative for 
any two elements in G. A non-empty set R with two binary compositions, 
call these addition and multiplication, defined on it is called a ring if R is an 
Abelian group with respect to the composition addition, if multiplication in 
R is associative, and if distributive laws hold for all elements in R. A set F 
having at least two elements with two compositions, be them called addition 
and multiplication, defined on it is called a field if it is a commutative ring 
with identity every non-zero element of which has an inverse with respect to 
multiplication. A field having only a finite number of elements is called a 
finite or Galois field. 

8 


Example 28. The set F, = {0,...,p —1} in which addition and multi- 
plication are defined modulo p, where p is a prime integer, is a finite field. 
For p = 2 we have Fy = {0,1}, which is denoted by B. The set B” of all 
ordered n-tuples or sequences of length n, a positive integer, with each tuple 
or entry of the sequence being in the field B and a composition defined as a 
componentwise summation of any two sequences in B”, is an Abelian group. 
The zero sequence of length n is the identity of B” and each element in B” 
is its own inverse. 


Definition 50. <A binary block (b,n)-code comprises an encoding function 
E: B® > B® and a decoding function D : B” + B®. The images of E are 
called code words. 


§ 


Definition 51. Let two binary sequences be a and b in B”. The distance 
d(a,b) between a and b is defined as 


d(a, b) = a 
i=1 


where 


ae 1 if a; Ab; 
8 


Definition 52. The weight w(a) of a in B” is the number of non-zero 
components of the sequence a. 
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§ 


Theorem 28. Let a and 6 be any two sequences in B”. Then d(a,b) = 
w(a+t b). 


Proof. The only contribution of 1 to d(a,6) is a; 4 6; for all 1 <i <n. 
But this latter is the case if and only if a; + 6; = 1, and this contributes 1 to 
w(a+t b). q 

Definition 53 recapitulates the concept of homomorphism, whereas Defi- 
nition 54 defines what a group code is. 


Definition 53. Let X and Y be two groups. Then amap f : X > Y which 
satisfies the property f (#122) = f(vi1)f (x2) for all x; and x2 in X is called a 
homomorphism. Further, the homomorphism f is called a monomorphism if 
it is one to one, and it is called an isomorphism if it is both one to one and 
onto. 


§ 


Definition 54. A block code is called a group code if all its code words form 
an additive group. 


§ 


Definition 55. A bxn matrix G over B, where b < n, is called an encoding- 
or generator matrix if G is of the form 


G=[hG,] 


where J, is an identity matrix of dimension b and G,, a b x (n — b) matrix. 
An encoding function E : B’ > B” is defined by 


E(x) = «G 


for all « in B® 


§ 


Theorem 29. The encoding function E : B’ > B” given by E(x) = 2G 
for all z in B®, where G is a b x n generator matrix, is a monomorphism. 


Proof. Both B® and B” are additive Abelian groups. Then for all « and y 
in B® we know that «+y is also in B’ and E(a+y) = (x@+y)G =2G+yG = 
E(«) + E(y). Thus E is a homomorphism. Further, as the first part of G is 
I,, it follows that a part of E(x) is x itself. Therefore the matrix encoding 
method gives for each binary message word a distinct code word. In other 
words, the mapping E is one to one, which means that it is a monomorphism. 


q 


Definition 56. A code generated by a generating matrix is called a matrix 
code. 


§ 
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Theorem 30. A matrix code is a group code. 


Proof. The code words generated by & are associative, since 
wGt+ (t2G + x3G) = (1G + x2G) +23G 


They have a unique identity, that is the zero b x n matrix, and each of them 
is its own inverse. q 


Definition 57. An (b,b+1) parity check code is the code generated by an 
encoding function E : B’ + B°+? defined by 


E(ay -+- ap) = a +++ GpGo44 


where 
ge ae! 1 if w(a) is odd 
b+) 10. if w(a) is even 


w(a) being w(a, --- ap). 


Theorem 31. The (b,b+1) parity check code is a group code. 


Proof. Let our unencoded binary words be a = a, ---a,, b = b1--- by, and 
c = cy---c, such that c; = a; + 0; for i = 1,...,b, and let the coded words 
of a and b be respectively G@ = aapy+, and b= bbp4i1. Since c is odd if and 
only if either a is odd while b is even or vice versa, but when this is the case 
we have either aj41 = 1 and 0541 = 0, or a941 = 0 and by41 = 1. Either 
way we have cp41 = 1 = ao41 +0541. Next, c is even if and only if a and b 
are either both odd or both even. But when either of these is the case, then 
Qp41 + bo41 = 0 = co41. Hence € is a parity-check code word. The zero word 
is the identity and the inverse of each word is that word itself. Therefore the 
set of all code words forms a group. q 


Theorem 32. The minimum distance of a group code equals the minimum 
of the weights of its non-zero code words. 


Proof. Let d,, be the minimum distance of the group code, and wm the 
minimum of the weights of the non-zero code words of the same. Then there 
exist code words a and b such that dy, = d(a,b) = w(a +b) > wm. Now, 
Wm implies that there exists a non-zero code word c such that wm = w(c) = 
d(c,0) > dm. Hence dm = wm. q 


Example 29. Let the generator matrix be 


HOR 
See 


1001 
G={0 101 
0010 
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The dimension of G is b x n, which in this case is 3 x 6. Let a,a2a3a4a5a¢ be 
the code word and a,a2a3 the original word, then 


(ay a2 a3 44 a5 ae) = (a1 ag a3) G 


and then, 
a4 =a + a2 
a5 = a, + a3 
ag =a +a2+ az 


In other words, 


ay +ao +a, =0 
a, + a3 + a5 = 0 > parity check equations 
a, +a2 +a3 + a6 = 0 


These parity check equations are then, in matrix form, 


ay 
a2 


The matrix 


is called the parity check matrix of the code. Then G = (I; A) and H = 
(A°0 Is), where 


and 


Note 6. Let a be the vector formed from elements of the code word a. 
Then we have that, for any code word a the parity check matrix H has the 
property that Ha = 0. 

8 


44 14 January, 2007 God’s Ayudhya’s Defence 


Kit Tyabandha, PhD Coding Theory, notes and projections from lecture 


Example 30. The parity check code in Definition 57 is in fact a matrix 
code given by the generator matrix 


1 0 0 1 
0 1 0 1 
G=]|. 
0 1.1 
whose parity check matrix is the 1 x (6 +1) matrix H=(1 --- 1). 


Definition 58. The syndrome of a word r € B” is s = Hr°0. 


§ 


In syndrome decoding algorithm of Algorithm 3, r is a received word, s 
its syndrome, b, the decoded original word, while c, the decoded code word. 


Algorithm 3 the syndrome decoding algorithm. 


rere: Tereei Tn 
s¢ Hr°0 
ifs =0 then 
bp & (T1--+ 15) 
elseif s matches the i** column of H then 
Cr (ry sae ri_1(%% + Vrigi che Tn) 
b, - (Crt an Crb) 
else 
at least two errors have occurred in the transmission 
endif 


Theorem 33. An (n— 0) x 6 parity check matrix H will decode all single 
errors correctly if and only if the columns of H are distinct and non-zero. 


Proof. Suppose the i** column of H is zero, and let e be a word whose 


weight is 1 having 1 in the i** position and 0 elsewhere. Then for any code 
word b, we have H(b+e)90 = Hb°0 + He®0 = 0. So our decoding procedure 
becomes D(b + e) = b+e and the error vector e goes undetected. 

Next, suppose that the i** and the j*® columns of H are identical. Let e? 
and e? be words of length n with 1 in the i** and respectively jt" position 
and 0 elsewhere. Then for any code word b, we have H(b + e*)°0 = Hb90 + 
H (e’)’0 = H (e’)’ 0 = Hb°0 +H (e’)"0 = H (b+e%)’ 0. We are unable 
to decide whether the error occurred in the i*” or the j*" position. 

Conversely, suppose all the columns of A are distinct and non-zero. Then for 
any code word b and any error vector e of weight 1 having 1 in the i‘ position, 
H(b+e)°0 = H (b°0 + e90) = Hb°0 + He®0 = 0+ He%0. Our decoding 
procedure gives D(b + e) = b, therefore every single error is corrected. q 


Theorem 34. If G=(f, <A) isabxn generator matrix of a code, then 
H =(A°0 I,_») is the unique parity check matrix for the same code. If 


God’s Ayudhya’s Defence 14 January, 2007 45 


Coding Theory, notes and projections from lecture Kit Tyabandha, PhD 


H=(B In») isan (n—b) x n parity check matrix, then G= (Im B°0) 
is the unique generator matrix for the same code. 


Proof. Let the original word be a € B? and c be the code word corresponding 
to a with respect to the code given by the generator matrix G. Then c = aG 
Let a be a ,---a,. Since the first b columns of G is an identity matrix, it 
follows from c = aG that a; = b; for all 1 <i < b. Let € = coy1-++ cy, then 
C= C1-°++CeCpg1 Cn andc=(a €). Then, 


He°0 =(A90 In-5 )(aG)°0 

= (AO In» )G°0a°0 

=(A°0 Ins) (ImA)? 0a°0 

= (A°0 Ins) (8p ) 2° 

= (A®0Im + In_pA°0 ) a°0 

= (A°0 + A®0) a°0 

=0x a0 

=0 
Therefore c is the code word corresponding to the original word a in the code 
given by the parity check matrix H. 
Now, suppose first that c is the code word corresponding to the original word a 


as above in the code obtained from the parity check matrix H = (A°0 In_» ). 
Then c; = a; for all 1 <i< band Hc90 =0. Let €= cy41-++en. Then, 


9 a = 
(A’0 Ins) ( 569) =0 
A®0a°0 + In_€°0 = 0 


Therefore € = aA, and 
c=(a ¢)=(al, aA)=a(I, A)=aG 


Hence c is the code word corresponding to the original word a in the code de- 
fined by the generator matrix G. So far we have proved that codes determined 
by G and H are identical. 

Suppose that to G = (Im A) corresponds another parity check matrix H, = 
(B I,_»5). Let e* be the original word with 1 in the i*® position and 0 
elsewhere. The corresponding code word is e’G, that is the i** row of G, or 
we may write e’G = (e? é*), where é is the i” row of A. Since Hy is a 


46 14 January, 2007 God’s Ayudhya’s Defence 


Kit Tyabandha, PhD Coding Theory, notes and projections from lecture 


parity check matrix of the code defined by G, it follows that, 


Therefore (6") ° 0 matches the it” column of B , or equivalently é* matches the 
i*> row of B°0. Then the i** row of A is identical to the i*® column of B. 
And this is true for all 1 < i < b, so we have B = A°0 and therefore H, = H. 
Hence, to a given G there corresponds a unique H = (A°0 J,_,). Similar 
argument also holds if we start with a parity check matrix H given. q 


Definition 59. Let C' be a (b,n) code obtained from the generator matrix 
G = [f, A]. Then an (n—b,n) matrix code defined by the parity check matrix 
H =[AJj] is called the dual code C+ of C. 

§ 


Definition 60. Two words x and y are said to be in the same coset if and 
only if y= x+c for some code word c in C. 
8 


Theorem 35. Two words x and y in B” are in the same coset of C if and 
only if they have the same syndrome. 


Proof. By Definition 60 x and y are in the same coset if and ony if y= a+c 
for some c in C, which in turn is true if and only if x+y =cin C. Then it 
follows from this that, 


H(x+y)°0=0 

H (x°0+y°0) =0 

Hx°0 + Hy°0=0 
Hx°0 = Hy°0 


Next, we look at polynomial codes. Before we do so we look briefly at 
vector space again. Definition’s 61 and 62 are respectively about vector space 
and linear-dependence, and then Theorem 36 is on isomorphic vector spaces. 


Definition 61. Let F bea field. Then a non-empty set V is called a vector 
space over F if V and an addition form an Abelian group; for every a in F 
and v in V there is a uniquely defined element av in V such that for any v, v1 
and v2 in V and any a and bin F, a(v, + v2) = avy +ave; (at+b)u = avt bu; 
(ab)v = a(bv); and 1v = v, 1 being the identity of F. 

§ 
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Definition 62. Let V be a vector space over a field F. Then a set 
{v1,---,Un} of elements v; in V is said to be linearly independent if avi +---+ 
AnUVn = 0, for a1,.--,@, in F, implies ay =--- =a, = 0. A set {v1,-..-, Un} is 
called a basis of V if all its elements v1,..-,vn in V are linearly independent 
over F and all elements in V may be expressed in the form ayv, +---+4nUn 
where all a;,i=1,...,n, are in F. Also V is said to be of dimension n over 
F, dimV =n. A map f : V > W from one vector space to another, where 
V and W are vector spaces over the same field F, is called an isomorphism 
if the map f one to one and onto and, for all v, v1 and v2 in V and ain F, 


f (or + v2) = f (v1) + f (v2) and f(av) = af(v). : 


Theorem 36. Let two vector spaces V and W over the same field F' have 
the same finite dimension. Then V and W are isomorphic. 


Proof. Let dimV = dimW =n. Let {x1,...,2,} be a basis of V over F, 
and {y1,---,;Yn} a basis of W over F. Since all the elements of V can be 
uniquely written as a|21 +---+@,%, for some a; in F, the map f : V > W, 
which is 

f (a1t1 +--+ + Onn) = ary +--+ GnYn 


for a; in F, is well defined. 

Thus f is a homomorphism. Since f (a1%1 +---+@n%p) implies ary + 
>>> + @nYn = 0 implies ay = --- = a, = 0, which in turn implies a,7, + 
---+4@n2y, = 0, therefore f is one to one. Then, since all elements of W is of 
the form ayy, +---+@nYn, which is equal to f (a2, +---+an%,,) for some 
a1,---,@, in F, therefore f is also onto. Hence f is an isomorphism. q 

Definition 63 defines polynomial codes, Definition 56 matrix code, and 
Definition 65 parity check matrix. 


Definition 63. Let g(x) = go +---+g,x* be a polynomial in F[z]. We call 
the polynomial code with encoding or generating polynomial g(x) a code which 


encodes each original word of the message a = (ao,.-.,@)—1), Corresponding 
to 

a(x) — ao + Sey + apa? 
into the code word b = (bo,.--, 6s4%—-1), which corresponds to the code poly- 
nomial 


b(at) = bo +--+ boyn—iz?t* = a(x) g(a) 


§ 


Note 7. We assume for our generating polynomial that go 4 0 and g, 4 0. 
To justify this assumption, suppose we have g(x) = go +---+ 9x2". If go =0, 
then we choose a new polynomial for g(z) as gi(x) = a, +--- + a,x*—!. If 
gx = 0, then we choose another polynomial g2(r) = go + --- + a,-12¢*71. In 
either case our choice becomes more economical. 


§ 
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Theorem 37. A polynomial with coefficients in B is divisible by 1+ = if 
and only if it has an even number of terms. 


Proof. Let f(x) = a9 +---+an2” for all a; in B, i = 1,...,n, and let 
1+ 2|f(z). Then there exists a polynomial b(#) in B such that f(x) = 
(14+ 2)b(x). If ¢ = 1, we have ag + --- + an = 0. Since the field B is of 
characteristic 2, this is only possible if the number of non-zero terms is even. 
Conversely, let f(#) have an even number of non-zero terms, say f(z) = 
gt +...+ 9% where ij <--- < igx. Rewrite this as 


f(x) = (uv +2) +---+ (w*-1 4 ao) 


For i < j, atta? = wi (1+a7-*) = gt (1+2)(1+---+27-*!), which 
means that 1+2|z'+2/. Therefore 1+ x divides all bracketed terms in f(z), 
and hence 1 + 2|f(z). q 


Theorem 38. If g(x) € B[x] divides no polynomials of the form «* — 1 for 
k <n, then the binary polynomial code of length n generated by g(x) has 
the minimum distance of at least 3. 


Proof. Let g(x) = go +---+ 9,2", where g; are in B, go # 0 and g, # 0. 
Let b= n-—r. Suppose the opposite to what the theorem says is true. Then, 
polynomial code being a group code, there exists b(x) with at most two non- 
zero entries. There are two cases to consider, namely b() = 2‘ + 27, where 
i <j, and b(x) = 2*, where i < n. In the first one of these, since n is the 
code length, we have j < n, hence 0 < j —i < n. Since g(x)|b(x) implies 
g(x)|aF (1+ 29~*), and go £0 implies x /g(x), therefore g(x)|1 + 2/~* which 
contradicts our hypothesis. In the second case, similarly to the above g(x)|x* 
and we again have a contradiction. 


Definition 64. Let C be a (b,n)-code. If there exists a b x n matrix G of 
rank b such that C' = {aG|a € B’}, then G is called a generator matrix of 
the code C, and C is called a matrix code generated by G. 

8 
Definition 65. Let C bea (b,n)-code. If there exists an (n — 6) x n matrix 
H of rank n — b such that Hb°0 = 0 for all b in C, then H is called a parity 
check matriz of C. 


§ 


Theorem 39. A polynomial code is a matrix code. 


Proof. Let C bea polynomial b, n-code with the encoding polynomial g(x) = 
go+---+gpr". Then n =b+k. Let G be the b x n matrix whose first row 


begins with entries go,..., 9% followed by b zeros, and whose succeeding row 
is an anticlockwise cyclic shift of the previous one, that is 
[a gh ge 0 OF 


0 go “+ Gk 


0 Sues 90 Sie: Gk 
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The determinant of the submatrix formed by the first b columns is non-zero, 
since go # 0 and hence gi # 0. Thus the rank of G is m. Let the original word 


to be coded be a = (ao,.-.,@m—1). Then, since the code word generated by 
aG is the same as that generated from a(x)g(x), the two codes are identical. 
4 


Then we look at the Hamming codes as described in Algorithm 4. 
Algorithm 4 gives a procedure that construct the Hamming codes. Here 
b; represents a binary code equivalent of the decimal number i. 


Algorithm 4 Hamming codes 


choose r a positive integer 
be 27-r-1 
ne 27-1 
for 1 = 1 to 2” —1 do 
(the i” row of M)¢ (b,) 


endfor 
for i= 1 to 2” —1 do 
(a1, aay dor_1) - (b;) 


(bg2-1, cag ., bgr—2 = 1, bor-2 + 1, . 2, Dgr-1 = 1) = (a1, ack ,aQr_1) 
(byi-13j = 1,...,7) < solve (bM = 0) 
the i** code word + (b1,.--, bn) 
endfor 


Note 8. Each code word in a Hamming code contains b —n = 2" —r—-1 
2” +1=r check digits. The value of r is called the redundancy of the code. 
§ 


Bibliography 
L R Vermani. Elements of algebraic codng theory. Chapman & Hall, 1996 
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Examples 
Group, polynomial, and Hamming codes 


14" January, 2007 
31. Construct a Hamming code with three check digits. 


Solution. Chooser = 3. Then the code words will haven = 27-1 = 2? =7 
digits, and the message words m = 2” —r —1 = 23? -3—1= 4 digits. In each 
code word there are r check digits. The redundancy of the code is r. The 
check digits are formed as follows. 


b 
—_—_—“_iaearYG"..-_ 
by =0bp bg ig vee bn 
+ 4 4 4 
boo boi bo2 bor 1 


ee 
r check digits 


The rest of the code word are the 2” —r — 1 message digits in their usual 
order. Then for our present problem. 


check digits by be b4 
t ft t 
code word by by b3 ba bs be b7 
+ + + + 
message word ay a2 a3 a4 


Next, form a (2"—1) x r matrix M, where the i” row is the binary repre- 
sentation of the number i. 


lI 
SBerereOooco 
BPrROoOOrFRrFS 
BPOrOroOr 


Then form the matrix equation bM = 0 which gives r linear equations in the 
r unknowns 01, bo,..., ber-1. 


(b; be bg bg bs be b7) 


SBerereoond 
Be OoOrrS 
BPOorOrFOrF 
lI 
oOo 
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This gives us 


ba + bs + bg + b7 = 0 
be + b3 + bg + b7 =0 
by + bs + bs + by = 0 


To encode a message word, we place the message in its proper positions, then 
find boi, where 0 <i < r-—J1. For example, the message a; a2 a3a4 = 1001 


yields 
(bh, bo 1 bg O O 

Then, 
b4 +1=0 
bo +14+1=0 
bb, +1+1=0 


1) 


—> 
—> 
—> 


Therefore the encoded message is 0011001. 
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lI 
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bg = 1 
be = 0 
by = 0 
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Exercises for Group, polynomial, and Hamming codes 
14% January, 2007 


32. Given a generator matrix 


10003111 
0100001 
0010111 
0001011 


Find the parity check matrix, then all code words generated by the generator 
matrix above, and then find the dual code to that code. 

33. Check whether 974-93880-3-8, 974—-93880-4-6 and 0—1392-4101-4 are 
ISBN’s. 

34. Find the missing digits in the following ISBN’s, 974-91207—-4-3 and 974- 
9n801—-8-4. 

35. Construct tables of multiplicative inverses for GF(5) and GF(11). 

36. Find a primitive element for each of GF(5) and GF(13). 

37. Find the parity check matrix of the matrix code given by the generator 
matrix 


101011 

0100141 

00111 (0 
38. The parity check matrix 


1 
H=|1 
1 


orr 


010 0 
0010 
1001 


defines a code E : F3 — F§. Find all the code words. Does the code thus 
resulted correct all single errors? 

39. The exponent of a polynomial g(x) € F2[z] is the least positive integer e 
such that g(x)|x* — 1. Find the exponent of the polynomials a(x), b(x) and 
c(x) in B[z], when a(z) = 14+ 2427, b(z) =x +2? and c(z) =1+27? 421. 
40. Given an encoding polynomial g(x) = 1+ 2? + 2°. Find the generator 
matrix G of the (4,7) polynomial code. Then find the parity check matrix H 
of the code thus generated. 

Al. Find the binary equivalence of the decimal numbers 543, 25, 87 and 166. 
42. Find the decimal representation of the binary numbers 11001110, 
10100101, 10001100 and 10111. 

43. Find all the code words of the binary (3,5) Hamming code. 

44. Find a parity check matrix of the binary (9,13) Hamming code. 

45. Write a parity check matrix for the 7-ary (8,6)-Hamming code, then use 
it to decode the received vectors 54326010 and 11063452. 


Reference 


Raymond Hill. A first course in coding theory. Clarendon, 1986 
LR Vermani. Elements of algebraic codng theory. Chapman & Hall, 1996 
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Finite field- and BCH codes 
24 December 2005 


Definition 66. Let G be a group. Then a coset is a subgroup H of G which 
is either a left coset of H, that is xH = {xh:h € H} for some z in G, ora 
right coset Hx = {hx : h € H} of the same. 

8 
Definition 67. Let polynomials fi (z),..., f,(%) in F,[z] be non-zero. 
Then the least common multiple lcm (fi(x),...,fr(x)) of fi(z),...,fr(x) is 
the monic polynomial of the lowest degree which is a multiple of all f(x), 
VST ott. 


§ 


Problem 6. Prove that for non-zero polynomials f;(x),..., f,(x) in F,[z], 


lem (fi(z), oo -sFr(z)) = lcm (lem (fi(z), tps , fr-1(2)) ’ f,(z)) 


Note 9. Let fi(x),..., f-(z) in F,[z] have the factorisations, 


fi(x) = a1 (pi(a))™ +++ (pn(x))™ 


f(x) =a, (pi (x))°™ an (pp(a))°" 


where a,...,@, are in Fj, ej; > 0, and p(x) are distinct monic irreducible 
polynomials over F,, then 


lem (fi (2); «5 Fr(@)) = (Pa (2) PP «(Dy (w) MMe 

8 
Theorem 40. Let f(x), fi(x),...,f-(%) be polynomials over Fy. If f(z) is 
divisible by every polynomial f;, for i = 1,...,7r, then f(z) is also divisible 
by lem (f(z), ern s fr(x)). 
Proof. Consider first the case where there are only two different polynomials, 
fi(x) and fo(x). The prime components of f1(x) and fo(a) may be grouped 
into those which are unique among them and those which are shared. Since 
f(z) = ur(e)fi(z) + ri(w) and f(x) = ue(x)fo(x) + re(x), it follows that 
f(x) contains both of these two groups of primes. In other words, f(x) = 


u(x) lem (fi(x), fo(x)) + r(x). 
Next, consider the case where there are more than two f;’s. Suppose for f(x), 
that f(x) = u,(x) lem (fi(z),..., f-(x)). Then if we let 


f(z) = lcm (fi(x), ee) f,(z)) 
and if we introduce another polynomial f,41(x) such that f(x) = ursifpgit 
rr+i(x), then following the same line of reasoning as the above we have, 


lem(fi(x),..-; frta(x))|f (2) 
q 
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Definition 68. A non-empty subset S of a ring R is called a subring of R 
if the elements of S form a ring with respect to the operations defined in R. 


§ 


Theorem 41. Let R bearing. Then a non-empty subset S of R is a subring 
if and only if S is closed under addition, multiplication, and the formation of 
additive inverse. 


Proof. Since S is a subset of R, additive associativity, identity and com- 
mutativity are inherited to S from R. The existence of the inverse for each 
element s in S is certain provided that the formation of an additive inverse is 
guaranteed. And similarly in the case of multiplication, both associativeness 
and distributiveness hold once we know that S$ is closed under multiplication. 


q 


Definition 69. Let R be a ring. We call an ideal in R a subring J having 
such property that for all ¢ in I, then both xi and iz are also in I for every 
element x in R. Further, if I is a proper subset of R, then it is called a proper 
ideal. By trivial ideal one means either the zero ideal {0} consisting of the 
zero element alone, or the ring R itself. 

§ 


Note 10. The significance of the ideals in a ring is that they let us construct 
other rings from the first. The cosets of a ring R is a partition of R into 
equivalence sets, which are non-empty and disjoint, the union of which is the 
whole of the ring R. 

§ 


Definition 70. Let R be a ring and J an ideal in it. Then two elements 
x and y in R are said to be congruent modulo I, denoted by x = y(mod J), 
if « — y is in I. Since there is only ideal, we may a write this congruence as 
simply x = y. 


§ 


Note 11. The congruence modulo J of a ring R as defined in Definition 70 
is an equivalence relation since it is true that « = x for every x; x = y implies 
y =x; and x =y and y =z implies x = z. 

8 


Note 12. Congruences can be added and multiplied as if they were ordinary 
equations. In other words, if x1 = x2 and y, = yo, then x1 + yy = %2+ yo 
and 21y1 = Xeyo.- 


8 
Definition 71. Let R be a ring and let x be an element of R. Then the 
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coset |x] containing zx is the set of all elements y such that y = x. Then, 


[el ={y:y=2}={y:y-2e} 
= {y: y—a =ifor somei € I} 
={y:y=x+ifor somei € I} 
=f{eti:ieI}=a+I 
Furthermore, [x] = [21] means that « = x, that is to say, x— 2, isin I. Here 


x and 2x, are called representatives of the coset which contains them. 


§ 


Definition 72. A quotient ring, aka residue-class-, factor-, or difference 
ring, is a ring having the form of a quotient A/i of a ring A and one of 
its ideal 7. In other words, the quotient ring of R with respect to I the ring 
R/I ={x+I:a€ R}, wherex+JI = {x +1:i € I} is the coset of an element 
xz in R, and where addition and multiplication are defined as, 


[x] + [y] = [2 + y] 


and 
[a] - [y] = [zy] 
8 


Theorem 42. The zero element of R/I is 0+J =I, the negative of « + I 
is (-x)+J. If R is commutative, then R/I is also commutative. If R has an 
identity 1 and a proper ideal I, then R/I has an identity 1+ J. 

§ 


Problem 7. Prove Theorem 42. 
8 


Theorem 43. Let R be a ring and J an ideal of R. Then, for x and y in R, 
(c+1T+ytD=(e+y)+1 


and 
(e+ D(y+I)=ayt+l 


Proof. Let a and b be any two elements of the ideal J. Then, 
(gt+a)+(yt+b)=e+at+yt+b=(e+y)+(a+b)=(@+y)+d 
where p=a-+ bis in J. Further, 


(c +a)(y +b) = xy + ba +ay+ab 
=ayt+ct+td+e=ayt+f 


where c= bz, d= ay, e= ab and f =c+d+e are all elements of J. q 
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Note 13. Theorem 43 and Note 12 show that the quotient ring R/I defined 
in Definitions 72 is independent of the choice of x and y in the cosets « + I 
and y + I. In other words, the cosets [x + y] and [xy] resulted from addition 
and respectively multiplication in no ways depend on the particular repre- 
sentatives z and y chosen for the cosets [x] and [y] that go into them. This 
means that, if 2] =x and y, = y, then [21 + yi] = [x +-y] and [x1y1] = [zy], 
or equivalently 71 +y, =x+y and 21y1 = zy. 


§ 


Example 31. Some examples of quotient ring are Z. = Z/2Z and Ze = 
Z/6Z. 


Theorem 44. The polynomial ring F [2] is a commutative ring with identity. 


Proof. Fz] is a ring over the field F since under addition it is closed, 
associative and commutative, and has 0 as the identity and the inverse — f(z), 
where f(x) € F[x]; and under multiplication it is associative, distributive and 
commutative, and has 1 as the identity. q 


Definition 73. Let R be a commutative ring with identity. Then for any a 
in R the principal ideal generated by a is (a) = aR = {ar:r € R}. Further, 
R is called principal ideal ring if all its ideals are of this form. 


§ 


Theorem 45. Let F bea field. Then the polynomial ring F[z] is a principal 
ideal ring. 


Proof. The polynomial ring F [x] being a commutative ring with identity, 
it remains only to show that all its ideals are of the form (a)R = aR = 
{ar:r € R}, where a is in R. Let I be an ideal of F[x]. If J = 0, then J isa 
principal ideal generated by 0. If I 4 0, then choose 0 # f(x) € I such that 
deg f < degg for all non-zero g(x) in I. Write g(x) = q(x)f(z) + r(x). If 
deg g < deg f, then g = 0 andr = f. On the other hand, if n = deg f < degg, 
then either r is 0 or degr < deg f. Let 


f(a) = aga" +--+ + an 


and 
g(x) = bot™ +--+ + bn 


Then, with ap 4 0, 


g(x) = ag*box™—" f(x) + 91 (x) (23) 
where deg gi < m—1. Then 
n(x) = n(x) f(x) +7r(2) (24) 


From this it follows that either r = 0 or degr < deg f. From Equation’s 23 
and 24, g(a) = q(a) f(x) +r(x), where q(x) = ag 'borz™—" + q is in Fla]. If 
r #0, then r(x) is in [ and degr < deg f, which contradicts our choice of 
f(x). Therefore g = gf and I is a principal ideal generated by f(z). q 
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Definition 74. Let R be a commutative ring with identity. Then a non- 
constant f(x) in R[s] is said to be reducible if, for some g(x) and h(x) in R[x], 
f(x) = g(x)h(«) implies either deg g(x) = 0 or degh(x) = 0. Otherwise f(x) 
is said to be reducible. 


§ 


Theorem 46. Let F be a field f(x) in F[z] an irreducible polynomial. 
Then F[a]/ (f(a)) is a field. 


Proof. Let I be the ideal (f(x)) of F [x] generated by f(z). If I = F[z], then 
f(z) has an inverse, that is 1 = f(x)g(x) for some g(x) in Fla]. Then f(z) 
is a constant polynomial, which contradicts our statement of the theorem. 
Therefore F[x]/I has at least two elements, and F[z]/I being a polynomial 
ring it is a commutative ring with identity. Let g € Fa] and g ¢ I. Then, 


J = {a(x) f(a) + b(a) g(a) : a(@), (a) € Fla}} 


is an ideal of F[z] and there exists h(x) in F[z] such that J = (h(x)). But 
f(z) =1f(x)+0g(#) is in J, and thus f(x) = a(x)h(x) for some a(x) in F[z]. 
The polynomial f(a) being irreducible, either deg h(a) = 0 or dega(x) = 0. 
If the latter is the case, then a(z) is a unit in F[z], and then A(z) is in J, 
hence J = J, and hence a contradiction since we began with g being in J but 
not in I. Therefore it must be the case that h(x) is a unit in F[a], hence J 
is a unit, and thus 1 = a(x) f(x) + b(x)g(x) for some a(x) and b(x) in F[a]. 
And then 1+ I =I+4 b(z)g(z) = (o(x) + D(g(a) + 1). Thus g(z) + I has an 
inverse and F'[z]/I is a field. q 


Definition 75. Let K be a field and F a subfield of K. Then K is called 
an eztension of the field F, denoted by K|r. Since K has multiplication, it is 
a vector space over F’. The dimension of the vector space K over F is called 
the degree [K : F] of the extension K of F. The extension K'|p is said to be 
finite if the degree [K : F] is finite. 

8 


Definition 76. A prime subfield of a field F is the intersection of all subfields 
of F. It is the smallest of all subfields of F, and is unique. A prime field is a 
field which has no proper subfields. 


§ 


Definition 77. Let K|, be an extension of a field F. Then a € K is said 
to be algebraic over F if there exists f(x) in F[x] which has a as a root. 
Let a in K be algebraic over F and consider A = {f(x) € F[z] : f(a) = 0}. 
Here A is an ideal of the principal ideal domain F[a]. Let mi(z) in F[z] be 
a generator of A. If a is the coefficient of the highest power of x in mi(z), 
then m(x) = a~'mi(a) is a monic polynomial with degm(r) = degmi(z), 
and m(z) is also a generator of A. Let m(x) = r(x)s(x) for some r(x) and 
s(x) in Fla]. Then either r(a) = 0 or s(a) = 0, that is either m(z)|r(z) 
or m(z)|s(z). But degm = degr + degs, therefore either degr(z) = 0 or 
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deg s(a#) = 0. Hence m(z) is irreducible. Since m(x) is monic, irreducible and 
is of the least degree possible while admitting a as a root, therefore m(x) is 
called the minimal polynomial of a over F[a]. 


§ 


Theorem 47. Let C be an (n,k) linear code over Fy with parity-check 
matrix H, and d(C) the smallest number of column of H that are linearly 
dependent. Then if every subset of 2¢ or fewer columns of # is linearly 
independent, the code is capable of correcting all error patterns of weight 
w<t. 


Proof. When gq = 2, linear independence amounts to summing to 0. The 
code words of C are those vectors x in V,, (F,) for which Hx? = 0. But Hx? is 


a linear combination of the columns of H, that is to say, ifH =[c, --- cy], 
then Hx? = 2c; +---+2,¢,. Hence a non-zero code word of weight w gives 
a nontrivial linear dependence among w columns of H, and vice versa. q 


Corollary 47[1]. If g = 2 and all possible linear combinations of up to e 
columns are distinct, then d(C’) > 2e+ 1, and C can then correct all patterns 
of weight e or less. 


Problem 8. Prove Corollary 47[1]. 
§ 


Note 14. Hamming codes correct single errors. An extension of this is to 
the Bose-Chaudhuri-Hocquenghem codes which could correct multiple errors. 
In the case of Hamming code of length n = 2™ — 1, the parity-check matrix 
is given by H =[vo ... vVn—1i], where (vo --- Vn—1) is some ordering 
of the 2” — 1 non-zero column vectors in Vin = Vm (F2). The m x n matrix 
Hf takes m parity-check bits for the code to be able to correct one error. We 
may extend H such that it has m more rows and could correct two errors. 
Then, 


Vi eee V. he 
Hp = 0 n-1 
Wo Paces Wn-1 
where Wo,.-.,Wn—1 are in V,,. Since v;’s are distinct, we may look at the 


mapping from v; to w; as a function from V,, into itself, then 


Then Hy will define a code which corrects two errors if and only if the syn- 


dromes of the 1+n+ (5) error patterns of weights 0, 1 and 2 are all distinct. 


2 
Any such syndrome is a sum of a subset of columns of Hz, and therefore 
a vector in Vom. Let the syndrome be s = (81 ... Sam) = (Si 82), 
where s] = ($1,.--,m) and so = (8m41,---, 82m) are both in V,,. Defining 
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£(0,0) = O we consider a pair of errors occuring at i**- and j** position’s, 
s =(v;4+ v;,f(v;) + f(v;)). Then the system of equations, 


u+v=s, 
f(u) +f(v) =s2 


has at most one solution (u,v) for each pair of vectors from V,,. By trial and 
error we may find neither the linear mapping f(v) = Tv nor the nonlinear 
polynomial of degree 2 works, but f(v) = v? does. The matrix 


is the parity-check matrix of a binary code of length n = 2” —1 which corrects 
up to two errors. A vectore =(cg -:- Cy_i) in V, (F2) is a code word in 
the code defined by Hp if and only if 7/9 cai = Oj 03 = 0. Since the 
2m rows of the matrix Ha over Fy may not be all linearly independent, the 
dimension of the code is d(C) > n — 2m = 2™ —1— 2m. 


8 
Definition 78. The Vandermonde matriz is defined as 
1 aoe 1 
Az a ar 
eo _ ons 
8 
Theorem 48. Let a1,...,a, be distinct non-zero elements of a field. Then 


the the Vandermonde matrix is such that 


[1 1 

ay, a 

. . [#0 
ans gi 
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Proof. Subtracting row(i+ 1) — a, rowi, i= 1,---,r—1, yields, 


1 1 seu8 1 
0 a2 — a4 oe Ar — ay 
detA=|9 a2(az—-a1) +++ ay (ap—ay) 
0 a, *(ag—a) +++ al? (a,—a) 
to Ane A 
a2 ar 
= (a2—41)+**(@r -ai)) 
ays a 
a | 
a3 ar 
= (a2 — a1) +++ (Gp — a1) - (a3 — Gg) +++ (Gp — ag) ] 
ae ar 
=|] (a -4;) 
i>j 
Then, since a; are distinct and non-zero, therefore det A is non-zero. q 


Theorem 49. Any square matrix having a non-zero determinant has all its 
columns linearly independent. 


Proof. Let A be an r x r matrix, and that |A| # 0. Then suppose the 
columns of A are linearly dependent. Then one may write some column of A 
as a linear combination of the others, for example 


Then if column c; is replaced by ¢; — >=: a;¢; gives a matrix B with | B| = 
tAj 


| A|. But B also has a column whose all elements are zeros, which means that 
| A| = |B | =0, a contradiction and thus the proof. q 


Theorem 50. Let (ao,.-.,Qn_1) be an ordering of non-zero elements of 
Fam, and let t be a positive integer such that t < 2”-! —1. Then the matrix 


ao ere OAn—1 
3 3 
ao eee An-1 
H= . 
2t-1 2t-1 
a 7 Ona 
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is the parity-check matrix of a binary (n,k)-code capable of correcting all 
error patterns of weight w < t, with dimension k > n — mt. 


Proof. <A vector c = (cg,---,€n—1) in Vn(F2) is a code word if and only if 


He? =0. Thus, 
n—-1 : 
> cai =0 
i=0 


for j =1,3,...,2t—1. We simplify this by using the fact that (x+y)? = 2?+y? 
in characteristic 2, and 2? = z in Fp. Hence, 


n-1 2 n—-1 n-1 

= Be 
; Gay) = y eo! = ca;? 
4=0 i=0 i=0 


for 7 =1,3,...,2¢— 1, which gives us 


n-1 
cat 
i=0 


for 7 =1,2,...,2¢. Therefore we could also use the parity-check matrix 
ao eee An-1 
ae £20 3 
pie ee 
aft at, 


According to Theorem 47 H°0 is a parity-check matrix which corrects t er- 
rors if and only if every subset of 2t or fewer columns of H°0 is linearly 
independent. Next, since a subset of r < 2¢ columns of H°0 has the form 


ay aoe ar 
aoe @ 
AS 4 : 
ar ore azt 
where aj,...,@, are distinct non-zero elements of F2,,, we may consider the 
matrix 
a oar Ke 
A°0 = 
T T 
ay eee a, 


which is nonsingular since its determinant by the Vandermonde determinant 
theorem, Theorem 48, is 


1 bos 1 
at eee a. 
det A°0 = a1 --- a, ; st : =a1---a, |] (a; — as) #0 
: : : i<j 
as ane 
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Then the columns of A®0, and hence those of A, cannot be linearly dependent, 
and therefore the code corrects all error patterns of weight up to t. Now H, as 
a matrix with entries from F) rather than F2,,, has dimensions mt x n, hence 
the dual code has dimension k < mt, and the code has dimension k > n— mt. 


q 


Theorem 51. Let C bea linear (n,k)-code over GF'(q) with parity-check 
matrix H. Then the minimum distance of C is d if and only if any d — 
1 columns of H are linearly independent but some d columns are linearly 
dependent. 


Proof. The minimum distance of a code d(C) is equal to the minimum of the 
weights of the non-zero code words. Let x = 41 ---a%p be a vector in V(n, q). 
Then x is in C if and only if xH7 = 0 if and only if xh, + --- + a,h, = 0, 
where h,,...,h, are the columns of H. Therefore there is a set of d linearly 
dependent columns of H corresponding to each code word x of weight d. On 
the other hand, if there existed a set of d — 1 linearly dependent columns of 


Hf, then there would exist some scalars x;,,...,%i,_,, not all zero, such that 
we xi, = 0. But if this were the case, then xH T = 0 and so would be a 
code word of weight 0 < d< d(C). q 


Theorem 52. The maximum dictionary size m such that there exists a 
g-ary (n,m, d)-code is A,(n,d) < grat, 


Proof. Let C bea q-ary (n,m, d)-code. If we remove the last d—1 coordinates 
from each code word, then the m vectors of length n —d+1 so obtained must 
be distinct, otherwise d(C) must be less than d, which would contradict the 
statement above. Therefore m < q”~4*1. q 


Theorem 53. Let C be the code over GF (gq), where g is a prime number, 
and C is defined to have the parity-check matrix 


1 1 1 see 1 

1 2 3 tee n 
Ha-|1 2? 32 een? 

1 gd—2 gd—2 a? 


where d<n<q-—1. If q is a prime-power, then A,(n,d) = q”~@*". 


Proof. We have, 


n 
C= {2 -++2n € V(n,q) st. Sota; = Oforj =O... 
i=1 
Any d—1 columns form a Vandermonde matrix, and therefore by Theorem’s 
48 and 49 are linearly independent. By Theorem 51 C has a minimum distance 
dand therefore is a q-ary (n, q’~7*", d)-code. The proof follows since C meets 
the Singleton bound of Theorem 52. q 
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Problem 9. Find the decoding procedure for the BCH codes. 


Solution. Assume that d = 2¢+ 1 and H has 2¢ rows. Suppose the code 
word c = c':'¢, is transmitted and the vector r = 11 ---Tn is received. 


Assuming that at most ¢ errors have occurred, let 71,..., 2; be their positions 
and m1,...,m¢ their respective magnitudes. Then the syndrome is 
(81, a -, 824) = rH 


and we have 


n t 
sj = yon = So mia) * (25) 
i=1 i=1 
for j =1,...,2¢. Then from 
My mes Mt 
0) = ee 26 
9(8) 40. ase” =a ( ) 
and a 
together with Equation 25, we have 
b(O) = 81 +820 +--+ + 8,01 +--- 
Also, from Equation 26 we have 
ay + a2 + a307 +--+ a,0°-1 
(0S = aa (27) 


1 +010 + 6:02 +--+ + 0! 
Hence, 
(81 + $20 + 8367 +--+) (1+ 010 + b207 +---+0,6") = a, +420 +--+ +a,0°" 


Which gives us 


i-1 
ai = §1 and a; = > 85-503, CSD en rt (28) 
j=0 
and 
t 
O= >) s;-;b;, i=t+1,...,2 (29) 
j=0 


With a; and b; known we may turn Equation 27 into partial fractions 


P1 Pt 
d= hae 
9(8) arg Laer 
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and therefore m; = p; and x; = q;, fori = 1,...,¢, and the system in Equation 
25 is solved. Algorithm 5 then gives the procedure for error correction. 


# 


Note 15. The polynomial 
o(9) =14+d,6 + bo6? +---+b,6¢ = (1— 216)--- (1 — 2,6) (30) 
can be used to locate the location of the errors. The polynomial 
w(O) = a, +a90+---+ 0,61 


can be used to find the magnitude of the errors. 


8 
Algorithm 5 Procedure for correcting up to t errors in BCH codes. 
input: r 
find s1,..., sa 


e +maximum number of equations in Equation 29 
fori=e+l1totdo 

endfor 

(b,,...,b-) ¢-solve the first e equations of Equation 29 
(z1,...,%e) «find the e zeros of Equation 30 
(a1,..-,@e) ¢-solve Equation 28 

for 1=1 toe do 


e-1 
a1ta2tit es t+Oer; 


Mi ra 
: | [ja1 +2; 24) 


j#i 
endfor 
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Examples 
Finite field and BCH codes 


14% January, 2007 
46. Construct a binary BCH code of length 7 and minimum distance 3. 


Solution. Here n = 7 and, the code being binary, q=2. Choosing r smallest 
such that q” >n+1, we have 2” >7+1=8 andr =3. Suppose 2? +2+1 
is reducible, then it must have either x or x + 1 as a factor, and then 7 = 0 
or 1 would be its root. But x|x? + «+1 gives a remainder 1 and so does 
x+1\x3+2+1. Thus neither of these divides x? + x + 1, therefore neither 
is a factor of the latter, hence 2* + 2 + 1 is irreducible. 


We have p = 2 and n = 3, hence z?"~! = g8-! —1=27 -1. 
wit+a?+a+1 
e+atl g’-1 
a +a° + a4 
xg +e*t1 
x + a> +2 
rte ta?+1 
e+e? +e 
e+atl > 0 = #+241\2?-1 
For k < 7; if k =6; 
ee+at1 
e+et+l e6 —1 
e§ +e4+ 445 
rit+a3+1 
ee+erte 
et+ar+atl 
re+art+1 
x ZO => aw +2r41 fx®-1 
If k = 5; 
e+ 
g+ati1} «2-1 
x +3 +a? 
et+a2+1 
gt+artl 
e+e £40 3a2+ar41 f2?-1 
Ifk =4; 
x 
e+atil1) «t-1 
zita+1 £40 => a#+2+4+1 fet-1 
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When k = 3, 23 +2+1 Jx?—1 is obvious. Therefore a = x+(x? +x +1) 
is a primitive. Then a satisfies a? +a+1=0. 


A minimum polynomial is a monic, irreducible polynomial of a least 
possible degree which has a as a root. For a finite field F or order p” with 
k as its prime subfield, a and a? have the same minimum polynomial over k 
for every a € F. 


Since p = 2, therefore both a and a” have the same minimum polynomial. 
Then the generating polynomial is 2? + « +1. Let our message word be 
ag@,a2a3. Then the message polynomial is a(x) = a9 + aiz + a227 + a3z°, 
and the corresponding code polynomial a(x) (2? + « +1). Therefore the code 
word is 


ag + (ap + a1) 2+ (a1 + a2) £7 + (ag + a + a3) 2? + (a, +43) 2* +a22° +a32° 
So our code word is 

(ao, (@9 + a1), (a1 + G2), (ao + Ag + G3), (a1 + ag) , @2, a3) 
Since the encoding polynomial has 3 non-zero terms, therefore the code has 


a minimum distance 3. 


# 
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Linear codes 
gt? December 2005 


Definition 79. Let V be a vector space over F,. Then a set of vectors 


A= {vi,...,v,} in V is said to be linearly independent if and only if a linear 
combination A\yvi+---+Axvx, being a zero-vector implies that \;,i =1,...,k, 
are zero. 

8 


Definition 80. Let V be a vector space over F,. Let S = {v1,..., vx} be 
a non-empty subset of V. Then, the linear span (S) of S is defined as 


k 
(S) = {ya :AGE F,| 


We say that the span (S) of S is a subset of V generated or spanned by S. 
Let C be a subspace of V, then a subset S of C is called a generating- or 
spanning set of C if C = (S). 

8 


Definition 81. An inner product on F, is a mapping (a, b) : EU XE, 2a 
such that, for all u,v,w in F7, 

a. (u+v,w) = (u,w) + (v,w) 

b. (av,w) = a(v,w), where a is a scalar 

c. (v,w) = (w,v) 

d. (u,v) = 0, for all non-zero u in F7, if and only if v = 0 


§ 


Definition 82. Let v and w be two vectors in F?. Then the scalar product, 
aka the dot- or Euclidean inner product, between v and w is defined as v-w = 
iL viwi € Fy. The two vectors are said to be orthogonal to each other if 
and only if v- w = 0. The orthogonal complement S+ of a non-empty subset 
S of F9, is defined to be 


S*+={veéF":v-s=0for alls € S$} 


When 5 = 9 we define St = Fi. 
8 


Note 16. The orthogonal complement S+ of a non-empty subset S of a 
vector space F{ is always a subspace of F. Moreover, (s)t = St. 


§ 


Definition 83. Let V be a vector space over F,. Then a non-empty subset 
A = {v1,..-,Vn} of V is called a basis for V if V = (A) and A is linearly 
independent. 


§ 
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Theorem 54. Let V be a vector space over F,. If dimv = k, then V has 
g* elements and 


ze 


! 


k-1 
Il (q* a q') 
i=0 

different bases. 


Proof. Ifthe basis for V is B = {v1,..., ve} and A1,..., Ax are in Fy, then 
V= yea Aiv;. Since |F,| is g, there are g choices for each 4;. Therefore V 
has exactly q* elements. 

Let B = {vi,.-.,ve} be a basis for V. Since B is non-empty, vi # 0 and 
there are g* — 1 choices for v,;. Then there are g* — q‘—! choices of v;, for 
i = 2,...,k because v; ¢ (v1,.-.,Vs_1). Therefore there are Ase (q* — q') 
distinct ordered k-tuples, (v1,...,v,)- The order of v1,...,v, is irrelevant, 
hence the number of distinct bases for V is a Tes. (q* — q'). q 


Corollary 54[1]. Let C be a linear code of length n over Fy. Then, 
dim C = log, |C|, in other words |C| = g@™°. 
§ 


Theorem 55. Let S be a subset of F?. Then, dim (S$) + dim $+ =n. 


Proof. When (S) = {0}, this is obvious. Next, consider cases where 
dim (S) = k, where 1 < k < n. Let {vi,..., vy} be a basis of (S), then 
we need to show that dim $+ = dim(S)* = n-—k. Since x is in S+ if and 
only if v;-x =--- =v, -x = 0, or equivalently Ax = 0, where the k x n 
matrix A is 


A= 
T 
Vi 


we know that the rows of A are linearly independent. Then Ax = 0 is a linear 
system of k linearly independent equations in n variables, where n > k, and 
therefore admits a solution space of dimension n — k. q 


Corollary 55[1]. Let C be a linear code of length n over F,. Then C+ is 
also a linear code, and dimC + dimCt =n 


Proof. This follows from Note 16 and Theorem 55 above. q 


Theorem 56. Let C be a linear code of length n over F,. Then, (e-) = = 
C. 


Proof. From Corollary 55[1], we have dimC + dim C+ =n and dimC+t + 
dim (Ge) =n, and hence dimC = dim (Cui: Let c be in C. Then for all 
x in C, we have c-x = 0, hence C C (e\r and the proof. q 
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Definition 84. A linear code of length n over Fy is a subspace of Fj. The 
dual code C+ of C is the orthogonal complement of the subspace C' of Ey. 
The dimension of the linear code C is the dimentions of C’ as a vector space 
over F,, that is to say, dimC. A linear code C of length n and dimension k 
over F? is called a q-ary [n, k]-code, or an (n,q*)-linear code. If the distance 
d of C is known, it is called an [n, k, d|-linear code. Furthermore, C is said to 
be self-orthogonal if C C C+, and self-dual if C = C+. 

8 


Definition 85. Let x be a word in F?. Then, the Hamming weight w(x) of 
x is defined as the number of non-zero letters in x. In other words, w(x) = 
d(x, 0), where 0 is the zero word and d(x, y) is the Hamming distance between 
two words x and y in Fj. For each element x of F,, the Hamming weight 
may be defined as 


1, if #0 
w(e) = ate,0) = { ee 


Then for x = (21,...,2n) in Ff, 
w(x) = w(@1) +--+ + (an) 
8 


Theorem 57. Let x and y be two words in F?. Then d(x, y) = w(x — y). 


Proof. For each pair of letters x and y in F,, we know that d(z,y) = 0 if 
and only if x = y, that is if and only if z—y = 0, or equivalently w(z—y) = 0. 
The proof follows since w(x) = )>7_., w(x) and d(x,y) = 0, d(zi,yi). 9 


Corollary 57[1]. Let qg be an even positive integer. Then, for any two 
words x and y in F7 we have d(x, y) = w(x + y). 


Proof. The proof follows from the fact that a = —a for all a in F, when q 
is even. q 
Zz 


Theorem 58. Let x and y be two words in F¥. Then, w(x) + w(y) 
w(xt+y). 


Proof. For x = (21,...,%n) and y = (y1,---,Yn) in Fj, let x*y = 
(21Y1,--+,2nYn)- Then, for g= 2 andn=1, 


u*Yy w(x) + wly) — 2w(z *y) w(xt+y) 


* 
0 
0 
0 
1 


8 


ere OO 
Foro 
Corro 
Corr oO 


From this together with Definition 85 we know that w(x + y) = w(x) + 
w(y) — 2w(x xy) for x and y in Fs, and thus the proof is implied. q 
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Problem 10. Prove for any prime power q and x,y in F7/, that 
w(x) + wy) > w(x+y) > w(x) — wy) 


§ 


Definition 86. Let A be a matrix over F,. An elementary row operation 
performed on A is any one among the following. 
a. interchange of two rows 
b. multiplication of a row by a non-zero scalar 
c. replacement of a row by its summation with a scalar multiple of another 
row 


Two matrices are said to be row equivalent to each other if one is obtainable 
from another by a sequence of elementary row operations. 


§ 


Definition 87. Any matrix is row equivalent to a matrix in row echelon 
(RE) form or reduced row echelon (RRE){ form formed by a sequence of 
elementary row operations done upon itself. The RRE form of any given 
matrix is unique, but its RE’s may not be so. 


§ 


Bibliography 
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+ The RRE form has all its leading zero of each row the only non-zero 
entry in its column, and its value is equal to 1. 
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Examples 
Linear codes 


14% January, 2007 


47. Let gq = 2. Let S = {0001, 0010, 0100} be a subset of a vector space V 
over F2. Find the linear span of 9, (S). 


Solution. Write the span as 
A1(0001) + A2(0010) + A3 (0100) 


where A; € Fo, i= 1,2,3. 


A1A2A3 vector 

000 — 0000 

100 — 0001 

010 — 0010 

001 — 0100 

O11 —> 0010+0100 = 0110 

101 —> 0001 +0100 = 0101 

110 —> 0001 +0010 = 0011 

111 — 0001 + 0020+ 0100 = 0111 


# 


A= 48. Let q = 3 and S$ = {12101,20110,01122, 11010}. Find a basis 
for C = (S). 


Solution. We have 


12 101 
2 01 £1 =0 
ASG. at Oo 
1 10 1 0 


Reduce by elementary row operations our A into the row echelon form. Thus, 


124 61 12101 1°. De es 10. 4 
Oe fe a ee te ce ee ae ee Oh a 
01122 01122 01122 
14.6: 20 02212 02212 
ie Ds a 40. A ae eae ae | 
01122 01122 
Os O° GO OOo © ASG 208205 S004 
00001 00000 


Then {12101, 01122, 00001} is a basis for C. 
# 
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49. Let q = 2 and § = {11101, 10110, 01011, 11010} Find a basis for C = (S). 


Solution. Form a matrix A according to Algorithm 4.2, and reduce it into 
row echelon form. 


1101 1101 1101 
1011 0 1 1 £0 0 1 1 0 
A=/]1100;7>]/0 00t%1]->]0 00 1 
011i 011i 00 0 0 
1 0 1 0 011i 00 0 0 


Then the leading columns, which are those that contain the first one in each 
row, are 1, 2 and 4. Therefore {11101, 10110, 11010} forms a basis for C. 


# 
50. Let C be the binary [5, 3]-linear code with the generator matrix 
101 1 +0 
G={0 1011 
00101 
Encode the message u = 101 and find the information rate of C. 
Solution. The message u is encoded into v as, 
101 1 £ 0 
v=uG=(101)}0 101 1)=(1 001 1) 
00101 
# 
The information rate of C is 2 
# 


51. Find the cosets of the binary linear code 


C = {0000, 1011, 0101, 1110} 


Solution. We find cosets one after another, and see whether they are a new 
one. 
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0000+ C = —+ 0000, 1011,0101,1110 7+ I 
0001+C — 0001,1010,0100,1111 4 TI 
0010+C —- 0010, 1001,0111,1100 - MII 
0011+C —- IV 

0100+C —- IT 

0101+C —- I 

0110+C —- IV 

dlll+C - I 

1000+C — 1000, 0011,1101,0110 —- IV 
1001+C - I 

1010+C - IT 

1011+C 7 IT 

1100+C - I 

1101+C - IV 

1110+C 7 JT 

1111+C - IT 


Thus there are four cosets, and they are the I, II, III and IV as shown. 


# 
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Cyclic codes 
6%" January 2005 


Definition 88. A subset S of F/ is cyclic if (ao,...,an-1) € S$ implies 
(an —1,a0,.--,@n—2) € S. A linear code C is called a cyclic code if C is a 
cyclic set. The word 


(Un—r; +++) Un—1, U0, U1,-- -,Un—r-1) 


is said to be obtained from the word (uo,.--,Un—1) in F7 by cyclically shifting 
r positions. 


§ 


Definition 89. Let R be a ring. A nonempty subset J of R is called an 
ideal if both a + 6 and a — b belong to J and r -a is in J, for all a and bin I 
and r in R. 


§ 


Note 17. F7, also denoted by V(n, q), is the vector space of all vectors of 
length n over Fy, which is also known as G'F(q). Here we suppose that n and 
q are relative primes of each other, that is to say, (n,q) = 1. 


§ 


Cyclic codes can also be defined as Definition 90. 


Definition 90. Let @ be a mapping such that 
6: F) 4 Flz]/(z” —-1) 


where (%” — 1), sometimes denoted by (x” — 1), denotes the ideal of the poly- 
nomial ring F'[x] generated by «” — 1 by 


9 (ao, ---;4n—1) = 49 +a1e +--+ + p12" + (2" — 1) 


for all a; in Fy, 0<i<mn-—1. In other words, 


6: (ag,-..,@n—1) + a9 tayz+-+ay_12""1 


§ 


Note 18. F[z]/(x” — 1) is also a vector space over F is an F,-linear trans- 

formation of vector spaces over F,. In fact it is a vector space isomorphism. 

We could identify Fj with F,[z]/(2" — 1), and a vector u = (ao,...,@n—1) 
n-1 a 


with the polynomial a(x) = )0;"9 aia 
8 
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Theorem 59. Let 6 be the mapping defined in Definition 90, and let C be 
a linear code of length n over Fz. Then C is a cyclic code if and only if 6(C) 
is an ideal in the quotient ring F[x]/(«” — 1). 

Proof. Since C is a linear code of length n over Fy, it is a subspace of the 


vector space F7, and 6(C) a subspace of F[x]/(a" —1). Let (ao,---,@n-1) 


be in C. Then (a@n_1,40,---,@n,_2) is in C if and only if 
On—1 + Ope +-->+an_ot™ ) +(e" —-1)= 


x (ao tau tert Gat) + (a” — 1) 
is in 0(C). q 


Example 32. Recall that a [n, k, dj-linear code is a code of distance d, length 
n and the number of elements in the bases k. Then the binary [3, 2, 2]-linear 
code {000, 110, 101, 011} is a cyclic code and 6(C) = {0,1+2,1+2?,2 +27} 
is a subset of F2[x]/ (2? — 1). Moreover, 6(C) is an ideal. 

Example 33. The set of all the integers divisible by a fixed positive integer 


m is an ideal of Z. All the polynomials in the polynomial ring F,[z] that are 
divisible by a fixed, non-zero polynomial f(x) form an ideal. 


Definition 91. An ideal J of a ring R is called a principal ideal if there 
exists an element g in J such that 


I= (9) ={gr:re€ R} 


The element g is called a generator of I, and I is said to be generated by g. 
A ring is called a principal ideal ring if all its ideals are principal. 


§ 
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Examples 
Cyclic codes 
14% January, 2007 


52. Show whether the sets {(0, 1,1, 2), (2,0, 1,1), (1,2, 0,1), (1,1,2,0)} c F4 
and {11111} Cc F3 are cyclic codes. 


Solution. Both are cyclic sets since (a@n_1,a@0,@1,---,@n—2) is in S for all 
(ao,---,;@n—1) in S. But for the former set 0112 + 2011 = 2120, which is not 
in the set, therefore the set is not a cyclic code. 


# 
Similarly for the latter, since 11111 + 11111 = 00000, not in the set. 


Hence this set is also not a cyclic code. 


# 
53. Show that in a the ring F2[z]/ (z? — 1) the subset 
Ls {0,1+2,¢+27,1+27} 


is an ideal. 


Solution. A non-empty subset J of a ring R is an ideal if both a+b and 
a—b belong to J for all a and bin J, andifr-ae€ TJ for allr in R anda in J. 
Here g = 2, from which we know that a+b=a-— b. Since 


O0+1+2 =l+z 

O+2+4+2? =2+4+2? 
l+at+atea? =1+2? 
l+a@t ite? =2+4+2? ez 
et+arteta? = 
eter-ti+ar =l+e 


Next, the ring R being 
F.[2]/ (a? — 1) = {0,1,2,1+2,27,1 ejatejltes a} 


and its subset to consider 


I= {0,1+2,c+27,1+27} 
the multiplication table of r-a for allr € Randa € TI is 

0 i1+¢2 1l+a2% a¢4+2? 
0 0 O 0 0 
1 0 l+2 1l+a2? «+2? 
x 0 «ta? 142 1+2? 
l+¢z O l+a? ata? le 
x 0 lta? eta? lta 
14+ 2? 0 ata? l+e 1+ 2? 
e+? 0 il+2 l+e2? a¢4+2? 
lt+a4+2? 0 0 0 0 
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Table 5. r-afor allr€ Randacel 


From Table 5 we can see that r-a is in J. Hence we conclude that the 
subset I of R is an ideal. 


# 


54. Find how many binary cyclic codes of length 6 there are. Give one cyclic 
code as an example. 


Solution. First we factorise the polynomial x° — 1 € F2[z] thus, 


g®—1=(14 ey? (d ta x)” 
Next, list all the monic divisors of «® — 1, 


1, l+a, l+a+2?, (1l+2)?, (1+2)(1+2+27), 


(l+2)? (1+2+27), (l+a+2°), (l+2) (1+ 2427)’, (1+2°) 


Since the nuber of these is nine, there are nine binary codes of length 6. We 
can then write down all these cyclic codes based on the map 7, for example 
the one corresponding to the polynomial (1+ «+ x)” is found by 


(l+a+2?)? = (lt+a2+27) (1+a4+27)=1427?+2' 


0-(1¢a2+24) =0 + 000000 
1- (1 x? a‘) — w?+a* -—+ 101010 
a-(l+a?+24) =a2+23+2° + 010101 


Then all the additions among these give another word, 010101+101010 = 
111111. Hence the cyclic code is 


{000000, 101010, 010101, 111111} 


55. Based on the factorisation 27 — 1 = (1+) (1+2?+423) (l+a+2%) € 
F.[z]. Find the number of different binary [7, 3]-cyclic codes, 


Solution. First we list all the 8 monic divisors of x’ — 1, 
1, Ifa, (2? 42%), (e+e), 


(l+2)(l+2?+2°), (l+2)(l+2+2°), 
(14a? +2") (L+e42"), (+2) (142742) +e42°) 
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We need a code for which the dimention is k = 3. Write all the bases of each 
of these monic divisors. 


1,a,27,...,2° > k=7 
l+aj,e+a?,a7+23,...,2° +28 > k=6 
(1+? +23) > k=4 
(l+2+2°) > k=4 
(1+2)(1+2?+2%) ,2(1+2)(1+2?4+2%) @=1,2) > k=3 
(l+2)(l+2+2%),o(1+2)(1+2+42%) (é=1,2) > k=3 
(l+0?+23) (l+2+2° > k=1 
(1+2)(1+2?+2%) (l+2+2%) > k=0 


From this, we see that there are two such divisors which have k = 3. 
This means that there are exactly two different binary [7, 3]-cyclic codes. 


For the first one, (1+ 7) (1+ 2?+2%) =1+a¢+a?+4+<2". 
0-(1+a2+2?+4+ 2%) — 0000000 
1-(l+a+a2+24) -+ 1110100 
a-(l+a+a?+2') - 0111010 
a. (l+a+a?+24) - 0011101 

And the pairwisely exhaustive additions yield, 


1110100 + 0111010 = 1001110 
1110100 + 0011101 = 1101001 
0111010 + 0011101 = 0100111 
1110100 + 0111010 + 0011101 = 1010011 


Therefore, 
(1+) (1+2?+2°)) = 


{0000000, 1110100, 0111010, 0011101, 1001110, 0100111, 1010011, 1101001} 
# 


Similarly for the second one, 
(1+2)(l+2+2°)) = 


{0000000, 1011100, 0101110, 0010111, 1001011, 1100101, 1110010, 0111001} 
# 
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Exercises on cyclic codes 
14%” January, 2007 


56. Determine all the binary cyclic codes of length 9. 
57. Is x® + 2? +1 irreducible over F.? If it is, then generate using the same 
the binary code of length 7 and dimension 3. 
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Goppa codes 
13%" January 2006 


Definition 92. A linear code with parameter [n, k, d] such that k+d=n+1 
is called a maximum distance separable (MDS) code. 


§ 


Theorem 60. Let C be a linear code over F, with parameters [n, k, d]. Let 
G be a generator matrix, and H a parity matrix, for C. Then, the following 
statements are equivalent. 

a. Cis an MDS code, 

b. every set of n — k columns of H is linearly independent, 

c. every set of & columns of G is linearly independent, 

d. C+ is an MDS code. 


§ 


Definition 93. An MDS code C over Fy is said to be trivial if and only if 
C satisfies one of the following cases. 
a C=F%, 
b. C is equivalent to the code generated by 1 = (1,..., 1), 
c. C is equivalent to the dual of the code generated by 1. C is said to be 
nontrivial if it is not trivial. 


§ 


The class of Bose, Chaudhuri and Hocquenghem (BCH) codes is a gen- 
eralisation of the Hamming codes for multiple-error correction. Binary BCH 
codes were introduced by A Hocquenghem (1959) and then independently 
by R C Bose and D K Ray-Chaudhuri (1960). D Gorenstein and N Zierler 
(1961) generalised the binary BCH codes to g-ary ones. The class of Reed- 
Solomon (RS) codes is a subclass of BCH codes introduced by IS Reed and 
G Solomon (1960). Goppa codes, a generalisation of BCH codes introduced 
by V D Goppa (1970 and 1971), are used also in cryptography some examples 
of which are the McEliece- and the Niederreiter cryptosystems. The Goppa 
codes are in turn a subclass of alternant codes, which was introduced by H J 
Helgert in 1974. 


Theorem 61. Let (a9, Q1,.-.,Q@n—1) be an arbitrary ordering of the n = 
2° — 1 non-zero elements of Foam. Than a word c= {@,.--,n—1} is a code 
word of BCH code if and only if a ca} =0, where j = 1,2,..., 2¢. 

8 


Definition 94. A g-ary Reed-Solomon (RS) code is a g-ary BCH code of 
length g — 1 generated by 


g(x) = (x = att) (x - att?) ae (« =, ante) 


where a is a primitive element of F,,a >0 and2<éd<q-1. 
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Theorem 62. Reed-Solomon codes are MDS. This means that a q-ary 
Reed-Solomon code of length g — 1 generated by g(x) = pases (x — a) is 
a {q—1,q—4,5}-cyclic code for any 2<6<q—-1. 

8 


Theorem 63. Let C beagq-ary RS code generated by g(x) = Tey (x — a’), 
where 2 <6 < q—1. Then the extended code C is also MDS. 
8 


Theorem 64. Let a be a primitive element of the finite field F,. Let 
q—1>6> 2. The narrow-sense g-ary RS code with generator polynomial 


g(x) = (x — a) (4 — a”) --- (2 —a®1) 

is equal to 
{(f(1), f(a), f({a”)),...,f (a? *)) : f(x) € Fy [x] and deg (f(x)) < q— 6} 
8 


Theorem 65. Let a be a primitive element of F,, and let g—1 > 6 > 2. 
The matrix 


v4 1 sit 1 
1 a a? eee qi-2 
1 a2 at ait a2(q-2) 
1 at) gG-8-1) gg 8-1-2) 


is a generator matrix for the RS code generated by the polynomial 


g(x) = (4 — a) (x — a”) ee (« —a°") 


8 
Definition 95. Let n < g. Let a = (a1, Q9,...,Qn), where aj, 1<i<n, 
are distinct elements of Fy. Let v = (v,.--,Un), where v; € F} for all 


1<i<n. The generalised Reed-Solomon code GRS;(a,v) is defined as 


{uif (a1) ,vef (a2),---,Unf (an): f(v) € Fy[z] and deg (f(x)) << k <n} 


§ 


Theorem 66. The dual of the generalised Reed-Solomon code GRS;,(a, vb) 
over F, of length n is GRS;_~ (a, v°0) for some v°0 € (F*)". 
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Theorem 67. 
v?0 030 vee v20 
vi 0a, v§0a2 ve v2 0an 
vi0az v30a3 vee v2 0a, 
9 n—k-1 9 n—k-1 9 n—k-1 
v1, 0a; U9 0a5 --- un Oar 


§ 


Definition 96. An alternant code A; (a, v°0) over the finite field F, is the 
subfield subcode GRS;(a,v)|r,, where GRS;(a,v) is a generalised RS code 
over Fy~, for some m > 1. 


8 
Theorem 68. The alternant code A, (a,v°0) has parameters [n, k°0, d], 
where mk —(m—1)n< k°90<kandd>n—k+1. 
8 
Theorem 69. The dual of the alternant code A, (a,v°0) is 
9 
aie (GRS,,_x (a, v°0)) 
8 


Theorem 70. Given any positive integers n, h, 6 and m. If 


yu-0" (%,) <@r =p 


then there exists an alternant code A, (a,v°0) over Fy, which is the subfield 
subcode of a generalised RS code over Fym, having parameters {n, k°0,d}, 
where k°0 >h andd>o. 

8 


Definition 97. Let g(z) be a polynomial in Fy»[z]. Let L = {ay,...,an} 
be a subset of Fym such that LM {zeros of g(z)} = 0. Let Re(z) = Vy SE 


for c = (c1,.. 1 € F}. Then, the Goppa code I(L, g) is defined as 


I(L,9) = {ce Eee he) =o (mod g(z)) } 


The polynomial g(z) is called the Goppa polynomial. The Goppa code ['(L, g) 
is said to be irreducible if g(z) is irreducible. 


§ 


Theorem 71. <A word is a code word of the Goppa code, that is to say, 
c € T(L,4Q) if and only if 


LIAM) (ayy =0 


i=l 
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8 
Theorem 72. Given a Goppa polynomial g(z) of degree ¢ and 
L= {ay,..-,an} 
we have [(L,g) = {c € F? : cH’ = 0}, where 
= -1 
glary g(a) 
H= ag fox)" ” : Ang (2) 
at g(a)! +++ ab tg(an) 
8 
Theorem 73. Given a Goppa polynomial g(z) of degree ¢ and 
L= {ay,..-,An} 
the Goppa code I'(L,g) is the alternant code An_j (a, v°0), where a@ = 
(a1,---,Q@n) and 
v°0 = (g(a), .--.9(@n)*) 
8 
Theorem 74. The Goppa code I'(L,g) is GRS,_;(a,v)|F,, where v = 
(V1,..-,Un) and 
vi = 9 (ai) 
[Lei ((a4 — a,)) 
for all 1 <i<n. 
8 
Theorem 75. Given a Goppa polynomial g(z) of degree ¢ and 
L= {ay,..-,an} 
the Goppa code [(L,g) is a linear code over F, with parameters [n, k, d], 
wherek >n—mt andd>t+1. 
8 


Theorem 76. The dual of the Goppa code I(L,g) is the trace code 
Tre jm /F, (GRS; (a,v°0)), where v°0 = (9 (an toed (0n)~*). 
8 


Theorem 77. Let ¢g = 2. Given a polynomial g(z), let g(z) represent the 
lowest degree perfect square polynomial that is divisible by g(z), and let t 
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the degree of g(z). For a vector ¢ = (c1,.--,¢n) € Fj of weight w, where 


Ci, =---=c;, =1, let 
w 


fel2) = [J (2-0) 


jot 


The derivative of f-(z) is 


f20(z) = 9°] (z - 2%) 


l=1 jAl 


Then, c € FY belongs to I'(L, g) if and only if g(z) divides f20(z). Conse- 
quently, the minimum distance d of I'(L, g) satisfies d >¢+ 1. If g(z) has no 
multiple root, that is g(z) is a separable polynomial, then d > 2¢+ 1. 

8 


Theorem 78. There exists a g-ary Goppa code ['(L, g), where g(z) is an 
irreducible polynomial in Fy~[z] of degree t and L = Fy» of parameters 
Iq”, k,d] such that k > q™ — mt, provided that 


y Pa oy & < 79" (1-¢-De*) 


w=t+l 
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Exercises for Goppa codes 
13%" January 2006 


58. Let 
U1 v2 wee Un 
UzQay V2A2 ans Un an 
G= ' ‘ 
vat! vak .-. vat} 


be a generator matrix for the generalised RS code GRS;(a,v). Let C be 
the code with generator matrix (G|u7), where u = (0,...,0,u), for some 
u € F*. Let v°0 = (v?0,..., v0) be such that GRSp_x (a, v°0) is the dual 
of GRS;,(a, v). 

i. Show that there is some w € F7 such that 


n 
y jv 0a"! + uw = 0 


i=l 
ii. Show that 
v0 v30 oe v0 0 
v9 004 vg0a2 ---  v20an sO 
H0—| vat v30o0Z--- uw 0a2— 0 
v20ar—* yank --- v20ar-* wy 


is a parity-check matrix for C. 
iii. Prove that C is an MDS code. 
59. Let n be odd and let Fam be an extension of F2 containing all the n*® roots 
of 1. Let a be a primitive n“” root of 1 in Fy» and let L = {1,a,...,a"1}. 
For c = (¢,.--,€n—-1) € F, let 


n-1 A 
R.(z) = S- Cx" 
i=0 


and let €(z) be its Mattson-Solomon polynomial. 
i. Show that ¢(z) = (z (z” + 1) R-(z) (mod z” — 1)) and 


n= yo) 


4i=0 


ii. Show that the Goppa code I'(L, g) is equal to 
I(L,g) = {ce € F3: (2"-'&(z) (mod z” — 1)) = 0(mod g(z)) } 
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Hint: For (i), show that z(z"+1)R.(z) = Dito cz T1jxi (z +0). 
Then show that (2 [Lv (2 +e) (mod 2” — 1) = eas a~% 23 by 
multiplying both sides by z+‘. For (ii), show that ¢ € T'(L,g) if and 


only if 3p Gi [Lvs (2 + 7) = 0(mod g(z)), and then use (i).) 
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MDS code 
20% January 2006 


Theorem 79. Given a redundancy r and a minimum distance d. An 
[n,n — 1, d|-code satisfies d< r+ 1. 


§ 


Definition 98. A linear [n, k,d] code over F with d= n—k+1 is called a 
mazimum distance separable (MDS) code. 


In other words, an MDS is a [n,n — r,r + 1]-code. 
§ 


Theorem 80. Suppose 2 <r < q. Let a1,...,a@ —1 be the non-zero elements 
of GF(q). Then the matrix 


1 1 cee 1 1 0 0 

ay ag Qg-1 0 1 0 

H=-| %@ 4@& a;_, 0 0 0 
are? a fai ari 0 oui 1 


is the parity check matrix of an MDS q+1,q+1-—r,r+1 code. Equivalently, 
the columns of H form a (q+ 1)-arc in PG(r — 1,q). 
8 


Theorem 81. Let C bea linear [n, k, d] code over a field F' of g elements, 
where g is a prime power with a parity check matrix H. Then C has a code 
word of weight w <1 if and only if / columns of H are linearly dependent. 


§ 


Theorem 82. Let C be a linear [n,k,d] code over F with a parity check 
matrix H. Then C is an MDS code if and only if every n — k columns of H 
are linearly independent. 


§ 


Theorem 83. If a linear [n,k,d] code C is MDS, then so is its dual C+. 
§ 


Corollary 83[1]. Let C be an [n,k,d] linear code over F = GF (gq). Then 
the following statements are equivalent. 
a. C is MDS 
b. Every & columns of a generator matrix G of C are linearly independent 
c. Every n — k columns of a parity check matrix H of C are linearly inde- 
pendent 
8 


Problem 11. Show that linear [n,1,n], [n,n — 1,2] and [n,n, 1] codes exist 
over any finite field F. 
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§ 


Definition 99. We call trivial MDS codes the [n,1,n], [n,n — 1,2] and 
[n, n, 1] codes. 


§ 


Theorem 84. The only binary MDS codes are the trivial ones. 
8 


Definition 100. A square matrix is said to be non-singular if its columns 
are linearly independent. Given any matrix A, as x s square submatrix of A 
is a s X s matrix consisting of the entries from some s rows and s column of 
A. 


§ 


Theorem 85. Let C be an [n,k,—] code with parity check matrix H = 
(A In_x). Then C is an MDS code if and only if every square submatrix of 
A is non-singular. 


Proof. Let B, be a square submatrix of A which rests upon the if, it, ..., 
it” rows of A with i, < ig <-+- <i, <n—k. Let M, be the square submatrix 
of H of order n — k having the columns of A parts which occur in B, and 
the ramaining n — k —r columns from J,_, that are not the it®, iP, ..., 
i*® columns. Thus we could always find the determinant by pivotting on the 
ones in the columns J;, j ¥ 41, %2,-,7, successively. Then det M, = pdet B,. 
Therefore B, is non-singular if and only if M, is. Hence every n — k columns 
of H are linearly independent if and only if every square submatrix of A is 
non-singular. q 
Theorem 86. Let C be an [n,k,—] code with generator matrix G = 
(I, A). Then C is an MDS code if and only if every square submatrix 
of A is non-singular. 


§ 


Theorem 87. Let C be an [n,k,d| MDS code. Then any k symbols of the 
code words may be taken as message symbols. 


§ 


Theorem 88. Let C be an [n,k,d] code over GF(q). Then C is an MDS 
code if and only if C has a minimum distance code word with non-zero entries 
in any d coordinates. 


§ 


Corollary 88[{1]. The number of code words of weight n —k+ 1 in an 
[n, k, d] MDS code over GF(q) is 


ce) ss ea4) 
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Problem 12. Given k and gq, find the largest value, m(k,q), of n such that 
[n, k,n —k+1] MDS code exists over GF(q). 
8 


Because of Theorem 83, Problem 12 is equivalent to Problem 13. 


Problem 13. Given k and q, find the largest n for which there isa k x n 
matrix over GF'(q), every k columns of which are linearly independent. 


§ 


Problem 14. Given a k-dimensional vector space V over GF(q), what is 
the order of a largest subset of V every k vectors of which form a basis of the 
same?. 


8 
Theorem 89. For any prime power g, we have m(2,q) =q+1. 
8 
Theorem 90. 
m(k,q)=k+1 
forg <k. 
8 
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Examples 
MDS codes 


14% January, 2007 


60. Consider the matrix 
3.5 6 2 1 
A=j)4 4 6 1 3 
25 2 1 6 
over GF (7). Show whether minimum distance separable (MDS) codes can 
be obtained from A. If they could, find two such codes and give either a 
generator matrix or a parity check matrix for each of them. Then give the 


code words and encoding functions for each. 


Solution. Examine the values of determinant of all submatrices of A. We 


Pe Ne ie Ol eo oie a her be 
have, |2 Gg] = 5 [2 nt a]=2 [2 3]=5 [a ol = 8 
5b. Bae fee asca feed 2 1|_, [3 5|_ 
7A maces eee ame ee eee a ae ee 
3, Ol. sO) ea Th ee 6) eee eh ole a 
22)—hlo 1j=%lo 6-4/5 af=bls 1] =4 15 6|=* 
BO) NG A DTN Se gl ba AP pe A Blog. A thy. 
24/7106) 86 |= oe Ble le ol eel ae 
7a (en ae ae a Ome en 
26) % 15 2)=%l5 1)/=% 15 6f/=2 lo i]=* 2 6| =% 
= 35 6 3.5 2 3.5 1 36 2 
| ol=3.[4 4 6/=5,/4 4 1/=4,/4 4 3]/=5,|4 6 1] =6, 

25 2 251 25 6 221 

361 321 5 6 2 5 6 1 521 
4 6 3|=6,/4 1 3[/=3,/4 6 1/=3,/4 6 3/=4,|4 1 3/= 
226 216 521 5 2 6 5 1 6 
621 
3,/6 1 3/=4 

216 


Every square submatrix of A is non-singular. From A we may obtain 
two MDS codes. These are namely the [8,3,—] code over GF(7) with the 
generator matrix G = (Ig A) and the [8,5,—] code over GF(7) with the 
parity check matrix H=(A_ Is). 


# 
For the [8, 3, —] code, the generating function is 
1003 5 62 1 
G={0 1044 61 3 
0 012 5 2 1 6 


God’s Ayudhya’s Defence 14 January, 2007 91 


Coding Theory, notes and projections from lecture Kit Tyabandha, PhD 


# 
Then, 
(ay a2 a3 Q@4 45 a a7 ag ) = 
1003 562 1 
(a1 a2 as){O 10 4 4 6 1 38 
0012 5 21 6 
and the encoding functions become 
a4 = 3a, + 4a2 + 2a3 
ads = 5a, + 4a2 + 5az3 
ag = 6a, + 6a + 2a3 
a7 = 2a, + a2 +43 
ag = a1 + 3a2 + 6a 
ia 
The code words are 
C = {10035621, 01044613, 00125216, 11002534, 10153130, 01162122} 
# 


For the [8,5,—] code, from the parity check matrix we know that the 
generating function is 


10000 4 3 5 
010002 3 2 

G=(is -—AT)=|0010011 5 
00010 5 6 6 
0000164 1 

Then, 

(ay a2 a3 Q@4 4 a a7 ag) = 
10000 43 5 
01000 2 3 2 

(a a2 a3 a4 a5 ) 0010 0 11 «5 
00010 5 6 6 
0000164 «1 


and the encoding functions become 
ag = 4a, + 2ag + a3 + 5a4 + bas 
a7 = 3a, + 3a2 + ag + 6a4 + 4as 
ag = 5a, + 2a2 + 5a3 + 6a4 + as 


# 
The code is then 
43510000, 23201000, 11500100, 56600010, 64100001, 
C = ¢ 66011000, 54310100, 22410010, 30610001, 34001100, 
02101010, 10301001, 60400110, 05600101, 43000011 
ia 
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Cryptography 
17% February 2006 


Definition 101. A cryptosystem is a system which modifies a message in 
such a way that it becomes unintelligible to anyone but the intended recipient. 
The process used in carrying this out is called encryption. A message thus 
encrypted is called ciphertext. The process by which a ciphertext is turned 
back into plaintext is called decryption. The art and science of encrypting 
messages is called encryption, whereas that of decrypting ciphertext without 
the key is called cryptanalysis. Both cryptography and cryptanalysis make 
up a branch of mathematics called cryptology. Let m be a plaintext message, 
also denoted by p, e(-) the encryption, d(m) the decryption, c an encrypted 
string, also known as cipher, ciphertext, or cryptogram, and k a key, that is 
a set of parameters. Then d(e(m)) = m and c = e(m,k). The range of all 
possible values of the key is called the keyspace. 

8 


Definition 102. There are two kinds of key-based algorithms, namely 
symmetric and public-key algorithms. Symmetric algorithms use the same 
key for both encryption and decryption. It is also known as secret-key, single- 
key, or one-key algorithms. There are two kinds of symmetric algorithms, 
stream and block ciphers. Stream algorithms work on a single bit at a time 
while block algorithms work on a group of bits. Public-key algorithms use 
different keys for encryption and decryption. The encryption key is called the 
public key, while the decryption key the private key. Encryption using public 
key is denoted by ex(p) = c, decryption using the corresponding private key 
by d,(c) = p. On the other hand, encryption using private key and decryption 
using public key, as in the case of digital signatures, are denoted respectively 
as e,,(-) and dx. (-). 

8 


Definition 103. An attempted cryptanalysis is called an attack. A success- 
ful attack is called a method. Assuming the encryption algorithm is known, 
there are six types of cryptanalysis attack, namely 

a. Cipher-text-only attack. Here given c; = ex(p;), i= 1,...,n, we deduce 
either p;, k, or an algorithm a that gives p,41 from Cp41 = ex (Pn4i), in 
other words a: (c = ex(p)) 4 p. 

b. Known-plaintext attack. Here given c; = e,(p;) and the corresponding 
p; we deduce either & or a: (c= ex(p)) 4 p. 

c. Chosen-plaintezt attack. Here choosing p; we are given c; = ex(p;) and 
deduce either k or a: (c = e,(p)) # p. 

d. Adaptive-chosen-plaintext attack. Here choosing p; (cj<;) the choices of 
which are based on the results of previous encryption, we are provided 
with c; = ex (p;) and try to deduce either k or a: (c= ex(p))  p. 

e. Chosen-ciphertezt attack. Here choosing c¢; we are given the correspond- 
ing p; = dx (cj) and try to deduce k. 
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f. Chosen-key attack. In this case you are given the key. So it is not in fact 
an attack, but rather only a decryption. 


§ 


Definition 104. An algorithm that is unbreakable in practice is said to 
be secure. A secure algorithm can be unconditionally secure if there is not 
enough information to recover the plaintext no matter how much ciphertext 
one may have, or it can be computationally secure, or simply strong, if it 
cannot be broken with available resources. The amount of computing power 
and time required to recover the encryption key is called the work factor. 


§ 


Definition 105. A substitution cipher is one in which each letter in the 
plaintext is replaced by another letter in the ciphertext. There are four types 
of substution cipher, namely 
a. A simple substitution cipher. This is the case where the character re- 
jone—one 4 
placements are one-to-one. In other words, p*’ ++ c’. 
b. A homophonic substitution cipher. Here the mapping of characters is 
one-to-many, that is pe"! 
c. A polyalphabetic substitution cipher. This is when there is a set of 
simple substitution ciphers for each character mapping, that is to say, 
{ qone—ene a 
pre 
d. A polygramme substitution cipher. This is the case where substitution is 


done on blocks of characters instead of a single letter. Here p*°"—-$"°c?. 
8 


Example 34. The Caesar cipher is a simple subsitution cipher in which 
each plaintext character is replaced by the character three to its right modulo 
26, that is c’ + (p' + 3) in GF(26). 


Example 35. ROT13 is a simple encryption programme commonly found 
on UNIX systems. It has the procedure as shown in Algorithm 6. 


given: c’ 

if c’ isin {a,...,m,A,...,M} then 
c ¢ ((¢+ 13) mod 26) 

else 
c © ((c — 13) mod 26) 

endif 


Definition 106. A transposition cipher is one in which the letters in the 
plaintext remain the same while their order is changed. 


§ 


Example 36. In asimple columnar transposition cipher we write the plain- 
text horizontally on a piece of graph paper of fixed width. The ciphertext is 
then read off vertically. 
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Definition 107. <A one-time pad encryption algorithm is one which uses a 
non-repeating set of random key letters. 


§ 


Bibliography 
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message source receiver 

V message decoded message 
source encoder source decoder 

\jcode word decoded word 
channel encoder =[>————>> channel t= >| channel decoder 

code word received 
with redundancy vector 


noise 
Figure 1 Encoding and decoding of message. The channel encoder and 
decoder are there to introduce redundancy which let us detect and correct 
errors. 
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Criteria for designing channel encoding algorithm and for the construction 
of the encoder and the decoder are namely fast encoding and decoding of 
messages, easy transmission of encoded messages, maximum rate of transfer 
of information, and maximum detection or correction capability. 
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Definition 1 Code 


Let A = {a1,...,a 7} be a code alphabet of size g, and its elements are the code 
symbols. We call a q-ary word of length n over A a sequence w = W1---Wn, 
or equivalently a vector (wi,...,Wn), where w; € A for all i. We call a q-ary 
block code of length n over A a nonempty set C' of q-ary words, that is code 
words, all of which is of the same length n. The number of code words C 
contains is the size m of C’, consequently m = |C|. The information rate of 
the code C is 


(log, |Cl) 
n 


We call an (n,m) — code a code of length n and size m. 
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Example 1 Codes 


A code over the code alphabet F2 = {0,1} is called a binary code, one over 
F3 = {0, 1,2} is called a ternary code. The term quaternary code refers to a 
code over either F4 = {0,1, 2,3} or Zs = {0, 1, 2, 3}. 
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Definition 2 Communication channel 


A communication channel consists of a finite channel alphabet A = {a1,...,aq} 
together with a set of forward channel probabilities pa,,, such that for all 4 


q 
SS Pay =1 
j=l 


where pa,; is the conditional probability that a; is received, given that a; is 
sent. If x is the word received when a word c was sent, e is the number of 
places where x and c differ, and n the length of each word, then the forward 
channel probability is 


Pex = p*(1—p)"© 
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Definition 3 Memoryless channel 


Let ¢c = c,-+-Cp and xX = 21 ---X,y, be words of length n. Then a communica- 
tion channel is said to be memoryless if 


n 
Pex = [eee 
i=1 
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Definition 4 Symmetric channel 


A memory less channel with a channel alphabet of size q is called a q-ary 
symmetric channel if each symbol transmitted has the same probability p < 5 
of being received in error, and whenever a wrong symbol is received, each of 
the g — 1 possible errors is equally likely. 


If p> 5; the channel is known to be useless. 
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Example 2 Binary symmetric channel 
The binary symmetric channel (BSC) is a memoryless channel having a chan- 
nel alphabet {0,1} and channel probabilities po, = pio = p and poo = pu = 
1—p. 
This probability of a bit error pin a BSC is called the cross-over probability 
of the BSC. 
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Example 3 The most likely word sent 
When a received word is not among the vocabulary of the code, the most 
likely word sent is the one whose pe,x,; is maximum over all 7 = 1,...,m. 


A rule for finding the most likely code word sent in case of an error is called 
a decoding rule. 
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Definition 5 Macimum likelihood decoding 


The mazimum likelihood decoding is 
ex = ma 
Pe? x ee Pex 
where x is the word received. 
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Algorithm 1 Decoding algorithm 


for all words x; received do 
if x; is not a valid code word then 
c; <the most likely c; according to the decoding rule 
else 
CF Xj 
endif 
endfor 
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Example 4 


Two kinds of maximum likelihood decoding are, when it happens that there 
are more than one word that has the same maximum likelihood, the com- 
plete maximum likelihood decoding chooses one of them arbitrarily, while the 
incomplete mazimum likelihood decoding rejects all of them and asks for a 
retransmission. 
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Definition 6 Hamming distance 


Let x = %1---%p and y = y1 --- Yn be words of length n over an alphabet A. 
Then the Hamming distance between x and y, denoted d(x, y), is the number 
of places where x and y are different from each other, and 


d(x, y) = d(#1, 41) ess Sh d(&n, Yn) 


where F 
,\_)9 ifti =u 
d(xi, yi) = { 1 ifa; 4; 
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Theorem 1 Hamming distance and the forward channel probability 


The Hamming distance d(x,c) =i corresponds to the forward channel prob- 
ability 


Pex = pl ~~ p)’* 


Proof. This is obvious from Definition’s 2 and 6. o 
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Example 5 
From Definition 6 it follows that 


O<d(x,y) <n 


d(x, y) = 0 if and only if x = y; and 


d(x, y) = d(y,x) 
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Example 6 
Let A be the roman alphabet. 


If x = ‘breed’, y = ‘bread’, and z = ‘break’, then 
d(x, y) = d(y,z) = 1, and d(x,z) = 2. 
On the other hand if A = {0,1, 2,3, 4,5,6}, p = 24601 and q = 54321, then 


d(p,q) =3 
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Theorem 2 


Let x, y and z be words of length n over A. Then the triangular inequality 
for their mutual Hamming distance holds, that is 


d(x,z) < d(x,y) +d(y,z) 
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Proof. Let a = d(x,z), b= d(x,y), and c=d(y,z). We havea>0,b>0 
and c > 0. 


What this theorem states is obvious when a = 0. 


If a > 0, then either b = 0 or b > O; if the former is the case, that is b = 0, 
then a = c and the theorem is true. 


If both a > 0 and b > 0, then either c = 0 or c > 0; if c= 0, then a = b and 
the theorem is again true. 
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But if a > 0, 6 > 0 and c > 0, then a, b and c may come from some of the 
diffences in common, as could be shown in the following Venn diagram. 


Figure 2 Common differences a a, < and c. 


Cb bs) 


ma 


> 


Cc 
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Let (x,y) be the differences in common between distances 2 and y, and sim- 
ilarly (x, y, z) those among z, y and z. 


Then from Figure 2 the area 1 is (a); 2, (6); 3, (c); 4, (a,c); 5, (a,b); 6, (6, ¢); 
and 7, (a,b,c). 


Then, d(x, z) arises from the differences (a) + (c) + (a,b) + (b,c), 
d(x, y) from (a) + (6) + (a,c) + (6, ¢), 
d(y,z) from (b) + (c) + (a,¢) + (a, 6), 


and therefore d(x,y) + d(y,z) gives (a) + (b) + (c) + (a,c) + (a,b) + (6,0), 
which is never less than in the case of d(x, y) and hence the theorem is again 
true. This exhausts all the cases and the theory is proved. 
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Definition 7 Minimum distance decoding 


The minimum distance- or nearest neighbour decoding rule decodes x to cx if 


d(x, cx) = min d(x, c) 
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Theorem 3 Mazimum likelihood- and minimum distance decoding rules 


The maximum likelihood decoding rule and the minimum distance decoding 
rule is the same for a BSC with cross-over probability 


ya 
aa 
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Proof. From Theorem 1, when p < 5, gives 
0 n n 0 
pilapy Sere pp) 


Thus the less the distance the more the likelihood, and thus the theorem is 
proved. a 
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Definition 8 Distance of a code 


Let C be a code containing at least two words. Then, the minimum distance 
or the distance of C is 


d(C) = min{d(x,y)|x,y € C.x #y} 


A code of length n, size m, and distance d is called 


an (n,m, d)-code. 
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Definition 9 Error vector 


Let a code word be of length n. 


Then, an error vector of weight k is a word containing all the & errors occured 
taking the value of 1 in their corresponding positions with the remaining 
positions of the word being zero. 


An error vector is also called an error word or an error pattern. 
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Definition 10 Detected and undetected errors 
An error vector is said to be detected by a code if a+ e is not a code word for 
any code word a. 


If there exists some code word a such that a+e is also a code word, we say 
that the error vector e goes undetected. 
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Definition 11 u-error-detecting code 


Let a received word x differ from the actual code word sent c by e errors. 
Then the corresponding code C' is said to be u-error-detecting if x is not a 
code word whenever 1 < e < u. 


Moreover, C' is exactly u-error-detecting if it is u-error-detecting but not (u+ 
1)-error-detecting. 
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Theorem 4 Distance of a u-error-detecting code 


A code C is u-error-detecting if and only if 


d(C) >ut+l1 
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Proof. Let céC. 


If d(C) > u+1, then x such that 1 < d(x,c) < u < d(C) implies that x ¢C, 
therefore C' is u-error-detecting. 


On the other hand, if d(C) < u+1, that is d(C) <u, 
then there exist x,,x2 € C such that 


1< d(C) < d(x1,x2) <u, then it is possible to send c; € C and incur errors 
such that 


1 < d(x,c1) = d(c2,c1) < u and x = ca, hence C is not a u-error-detecting 
code. o 
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Corollary 4[1] Distance of a u-error-detecting code 
A code with distance d is exactly (d — 1)-error-detecting. 
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Definition 12 v-error-correcting code 


Let v be a positive integer and assuming the incomplete decoding rule is 
used. Then a code C is said to be v-error-correcting if the minimum distance 
decoding can correct for it up to v errors. It is said to be exactly v-error- 
correcting if it is v-error-correcting but not (v + 1)-error-correcting. 
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Theorem 5 v-error-correcting code 
A code C is v-error-correcting if and only if 


d(C) > 2u+1 
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Proof. Suppose that d(C) > 2v+4+1. 


Let c € C be the code word sent, x the word received, and e errors occurred 
such that e < v. 


Then d(x,c) < v, and if C is not to be v-error-correcting there must be some 
€1,C€2 € C such that 


d(x, ci) + d(x, c2) < 2v. 


But since d(C) > 2u +1, which means that d(x,c1) + d(x,c2) > 2v + 1 for 
all c],c2 € C, it follows that C’ must be v-error-correcting. 
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Next, suppose that C is v-error-correcting and 


d(C) < 2u+1 


Then 


d(C) < 2u 


that is to say, there exist cy,c2 € C such that d(cj,c2) < 2v. This means 
that there exist 


x such that d(x,c1) + d(x,c2) = d(ci,c2) < 2u, hence C is not v-error- 
correcting. This contradicts what we have supposed earlier, therefore neces- 
sarily d(C) > 2u+1. o 
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Corollary 5[1] v-error-correcting code 


A code with distance d is exactly 
|] -error-correcting code, 


where |x| is the greatest integer less than or equal to x. 
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Definition 13 probability space and expectation 
A probability space is a triple (S,B, P) on the domain S, which is a nonempty set 
called the sample space, where (S,B) is a measurable space, B is a Borel field of 
subsets of §, and P is a measure on S with the property that P(S) = 1 and, for all 


disjoint FE; € B, 
P (U » = 5) P(E) 
i=1 


i=1 
In other words, P is a nonnegative function defined for all events FE; € B, and B 
measurable subsets of S. Further, a random variable X is a function mapping 5 into 
some set R, called the range of X. For convenience, we shall also use X to represent 
both the function and its own range, that is X is a function which maps S into X. If 
S is discrete and f is some real-valued function defined on 5, then both X and f(X) 
are two different random variables, and the expectation of the latter is given by, 


E[A(X)] = S- p(#)f(w) 


Coding theory, Entropy and mutual information, 4¢” November 2005 -1- From 
25°" October 2005 , as of 14°” January, 2007 


Kit Tyabandha, PhD Department of Mathematics, Mahidol University 


Definition 14 conditional probability 


Let p(x) be the probability that 2 € X occurs, similarly p(y) that y € Y 
does-, while p(z,y) that both « € X and y € Y do occur. Then, 


_ p(z,y) 
play) = 2 (1) 
and 
p(yle) = ee (2) 
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Definition 15 Markov chain 
A Markov chain is a set of random variable X;, where t = 0,1,..., such that, 


P (Xi = j|Xo = to, .-., Xe-1 = te-1) = P (Xe = j|Xt-1 = th-1) 


In other words, given the present state, the next state is conditionally inde- 
pendent of the past. 
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Definition 16 convexity 
A subset K C E”, where E” is the Euclidean space of n dimensions, is called 
convex if the line segment joining any two points in K is contained in K. Let 
the two points be 7; and 22, then the line segment joining them together is 


x = ta, +(1—t)are 


where 0 <¢# <1. 
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Definition 17 convex hull 


A point x is said to be a convex combination of points x1,...,%m if there 
exist nonnegative scalars a1,...,Q, such that 


Sia =1 


and 

S- AL, =X 
The set of all convex combinations of 7;, 1 = 1,...,m, is called the convex 
hull of {x;}-. 
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Definition 18 convex cup and cap 


Let f be a real-valued function, and let K be a convex subset of the domain 
of f. Then f is said to be convex cup if, for every 41,22 € K andO<t<1, 


f(tz, + (1 —t)x2) < tf(a1) + (1 — t)f(22) (3) 


It is said to be strictly convex cup if strict inequality holds in Equation 3 
whenever z+ 1 22. Similarly, f is said to be conver cap if, 


f(tz, + (1 —t)a2) > tf(r1) + (1 — t)f(a2) (4) 


that is to say, if —f is convex cup. It is said to be strictly convex cap if strict 
inequality holds in Equation 4 whenever 71 # %2. Convex cap is also known 
as concave. Geometrically speaking, f is convex cup if and only if all its chords 
lie above or on the graph of f, and f is concave if and only if all its chords lie 
below or on the graph of the same. 
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Definition 19 Jensen’s inequality 


Let K be some interval in E', and let F(x) be a probability distribution 
concentrated on K such that 


P(X <2) = F(z) 


Then, if the expectation E(X) exists, and if f(a) is a convex cup function, 
then, 
E(f(X)) > f(E(X)) (5) 


If f is strictly convex cup, then strict inequality holds in Equation 5. Similarly, 
if f is convex cap, then, 
E(f(X)) < f(E(X) (6) 


If f is strictly convex cap, then strict inequality holds in Equation 6. 
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Example 7 Jensen’s inequality in geometrical terms 


Suppose that in Definition 19 there is a mass distribution placed on the graph 
of f, then Equation 5 says that the overall centre of mass will lie above or on 
the graph, while Equation 6 says that it will lie below it. 
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Axiom 1 Entropy 
If the events are all equally likely, then the uncertainty function 


is monotonously increasing with m. 
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Axiom 2 Entropy 

If {El,..., E7,} and {E?,..., E} are statistically independent sets of equally 

likely disjoint events, then the uncertainty of the sets of events 


{E,N £;;i=1,...,m;j =1,...,n} 


1 1 1 1 1 1 
H(—... co) =H(5.-0 2) 4H (02) 
mn mn m m n n 


That is to say, 


is 


h(mn) = h(m) + h(n) 


hm) =H (=...) 


m m 


where 


Coding theory, Entropy and mutual information, 4°” November 2005 -10— From 
25** October 2005 , as of 14*” January, 2007 


118 14 January, 2007 God’s Ayudhya’s Defence 


Kit Tyabandha, PhD Coding Theory, notes and projections from lecture 


Kit Tyabandha, PhD Department of Mathematics, Mahidol University 


Definition 20 Entropy 


Let the set of m possible disjoint events be E = {£,..., Em}. We call an 
apriori probability of E;, p(£i), where 1 <i<m and 


m™m 
ne) =1 
i=1 
The uncertainty function or the entropy function, 


H(p(1), .--,p(m)) 


obeys Axiom’s 1 and 2. 
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Theorem 6 Entropy for equally likely events 
The entropy of a set of m equally likely events is 


h(m) = Alog,m 


where . is a positive constant and c > 1. 
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Proof. Proving Theorem 6 amounts to proving that Axiom’s 1 and 2 are 
satisfied if and only if h(m) = Alog,m. The two axioms say that h(m) is 
monotonously increasing in m and 


h(mn) = h(m) + h(n) (7) 


According to Equation 7, ifm =n = 1, then h(1) = h(1)+h(1), which implies 
that h(1) = 0. From this together with both axioms above, h(m) = Alog,m 
is sufficient as a solution. 
Next, we must prove that this solution is necessarily the only solution. Let 
a, b and c be positive integers, and a,b,c > 1. Then there exists a unique 
integer d such that 

C<g ae (8) 
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From Equation 8 it follows that, 
dloge < bloga < (d+ 1)loge 
and therefore, 
d < loga _d+1 (9) 
b — loge b 
Since h(m) is monotonously increasing, from Equation 8 we have, 
h(c?) < h(a’) < h(c**") 
Then from Equation 7, dh(c) < bh(a) < (d+ 1)h(c). And since h(m) is 
monotonously increasing, 
d h(a) _d+l1 
Se ee 10 
b ~ h(c) b (10) 
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From Equation’s 9 and 10 it follows that, 
loga h(a) 1 
loge h(c)| > b 
And, since b is arbitrary positive integer, 
h(a) _ loga 
h(c) loge 
h(a) _ h(e) 
loga loge 
Since a and c are arbitrary, 
nia) _ WO) 
loga loge 
Therefore, necessarily h(m) = A log, m is the only solution. q 
Coding theory, Entropy and mutual information, 4¢” November 2005 -15- From 
25" October 2005 , as of 14°” January, 2007 


Kit Tyabandha, PhD Department of Mathematics, Mahidol University 


Axiom 8 grouping 
The total uncertainty of events does not depend on the method of indication. 
8 
Axiom 4 continuity of entropy 


The uncertainty measure is a continuous function with regard to the proba- 
bilities within it. 


§ 
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Example 8 grouping events 
Let a set E of m disjoint events be {F1,..., Em}. Let j;, i = 0,...,n, be integers 
and 0 = jo < ji < Jo-+: < jn =m, and E be divided into n sets of events, namely, 


Gi = {Ei,..., Ej} 
Go = {Ej,41,---; Ejy} 


Ga = {Ei sity. Ba} 


If we indicate firstly the group, and then the event within that group, then the 
uncertainty becomes, 


H(p(£1),...,p(Em)) = H(p(G1),-.-, p(Gn)) 
x (11) 
+ S> p(Gi)H(P(Ej,-1 4113), .-+, P(Ej;|Gi)) 
i=1 
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The grouping axiom, Axiom 3, lets us express the uncertainty when all the 
event probabilities are rational. By grouping equally likely events together 
and then consider each of the groups as a single event, it gives us the ability 
to deal with events which are not equally likely. Example 9 gives an example 
how this is done. Then Axiom 4 extends Axiom 3 to cover also irrational 
probabilities, and Equation 12 is the result. 


Coding theory, Entropy and mutual information, 4¢” November 2005 -18- From 
25" October 2005 , as of 14°” January, 2007 


122 14 January, 2007 God’s Ayudhya’s Defence 


Kit Tyabandha, PhD Coding Theory, notes and projections from lecture 


Kit Tyabandha, PhD Department of Mathematics, Mahidol University 


Example 9 entropy of groups of events 
As in Example 8, let a set of disjoint events be E = {f£j,...,Em}, and 
let p(£;) = 4, i= 1,...,m. Also, let the groups of events G,,...,G, be 
defined the same way therein. Let n, be the number of events in Gz. Then 
Nk = jk — Jk-1 and p(Gy) = &, for k =1,...,n, and also p(E;|Gx) = = 
for jp-1 <i < jp. Then Equation 11 yields, h(m) = H(p(G1),.--,p(Gn)) + 
4 P(G,)h(n;). And since from Theorem 6, h(m) = Alog, m, we have, 


H(p(G1), ---, P(Gn)) = - d_P(Gi)(n(n) —h(m)) 


i=1 


= - 2 P(Ge) ( log =) =-A (>: HG.) oe (6: (12) 
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Example 10 scaling the entropy 


From 
h(m) = Alog,m 
if we let 
A = log, ¢ 
then 
h(m) = log, m 


In other words, the scale factor X can be absorbed in the base of the logarithm. 
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Theorem 7 entropy 
Let {p1,---,Pm} be a set of probabilities such that 


m™m 
n=l 
i=l 


Then, t+ 


t=1 


Proof. This is the results from Example’s 8 and 9, and the scale factor A 
disappears in a manner similar to that shown by Example 10. q 
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Example 11 units of entropy 


If the base of the logarithm in Equation 13 is 2, the unit of the entropy is bit. 
On the other hand if this base is e, that is to say, if we use natural logarithms, 
then the uncertainty has the unit of nat. From this, one may see that one nat 
is equal to log, e bits, which is approximately 1.443 bits. The term bit comes 
from binary digit, the term nat from natural digit. 
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Definition 21 conditional entropy 
The conditional entropy of X, given some y € Y, is, 


H(X\y) = — ¥- p(aly) log p(aly) (14) 


Then the conditional entropy H(X|Y) is the expectation, or average value, of 
H(X|y) over the range Y. In other words, 


H(X|Y) =) p@)HQXly) (15) 
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Theorem 8 conditional entropy 


The conditional entropy is, 


H(X/Y) = — > p(a,y) logp(aly) 


zy 
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Proof. Putting the equation of conditional entropy when y is given, Equation 
14, into the overall conditional entropy equation, Equation 15, we get, 


H(X/Y) = S> p(y) H(ly) 
=-S pty) > plaly) log p(aly) 


Then from Equation 1 of Definition 14, p(y)p(z|y) = p(z, y), and so, 


H(X/Y) = — > p(z,y) logp(aly) 


q 
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Theorem 9 conditional entropy 
Let X, Y and Z be discrete random variables. For each z € Z, let 


E(z) = 5) p(y)p(elz,y) 


yy 


Then, 


H(X|Y) < H(Z) + EdogE) 
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Proof. 
H(X|Y) = —Eflog p(z|y)] 


— py p(x, y, 2) log p(x|y) 


SO ae a 2) tog (aly) 


Because 
p(z,y, 2) 


p(z) 
is a probability distribution, that is a convex cap function, we may apply 
Equation 6, namely Jensen’s inequality for convex cap, from Definition 19. 


= p(x, y|2) 
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Hence, 


p(zly) 


= p= p(z) OB ay a » p(z) log §~ Se 


ry 


H(XY) < drt) log i a x P(z, mer) 


But, 
p(w,y,z) _ p(z,y,z)p(y) 


pee) ply) PUPGI¥) 


hence the statement above is proved. q 
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Corollary 9[1] Fano’s inequality 
Let X and Y be random variables each of which takes values in the set 
{z1, eae Zr}. Let 
P. = P(X# Y) 
Then, 
H(X|Y) < H(P.) + P, log(r — 1) 


Proof. From Theorem 9, let Z=0 if X = Y, and let Z=1if X # Y. Then 
E(0) = 1 and E(1) =r -—1. q 
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Theorem 10 maximum entropy 


The maximum uncertainty occurs when the events are equiprobable. 
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Proof. Since, 
1 1 m 
H (=... ~) —H(p,...,pm) = logs + Pi logs 
Mm mm 1 
S log, e )~ pln mp; > logy e > pi (1 = ) =0 
i=l i=l Pi 
it being the case that 
In - >l-« 
x 
Therefore H(pi,...,pm) is maximised when 
1 
R= TF 
m 
for alli =1,...,m. q 
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Example 12 inequality for a bound on Ina 
Figure 3 shows that Inz < x —1, while Figure 4 shows that such inequality does 
not exist when the logarithm in question is of base 10. 

Figure 3 Plots of nz and x —1, which show that nz < «—1. 


Ine<ar-1 


eo 
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Figure 4 Graphs of y = logx andy = x —1, which show that the latter 
is no bound for the values of the former. 


logig 2 and x1 


0 05 1 1S 2 25 3 
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Example 13 inequality for a bound on In+ 
Figure 5 confirms for us how In + > 1-2, whereas Figure 6 tells us that this is the 
case for log +. 

Figure 5 Plots showing In + and 1—<2, which show that In = >1l-g. 


Ini<i-2 
5 


2 
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Figure 6 Graphs showing y = log + and y = 1-2, from which it is clear 
the latter gives no bounds for the former. 


logig and 1-2 
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Example 14 two events 


Consider two events with probabilities p and 1— p. The entropy function is 
then, 


H(p, 1 — p) = —plogp — (1 — p) log(1 — p) 
Whenever the occurrence of either event become certainty, the entropy func- 
tion would become zero. Mathematically we see that 


lim plogp = 0 
p-0 

and 
lim plogp = 0 
pol 


Figure 7 shows a plot of the values of the entropy function for two events. 
Base-2 logarithm is used here. 
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Figure 7 The entropy function of two events with probabilities p and 
1l—p. 


H(p,1—p) = —logy p— (1 —p) log (1 — p) 


p) (bits) 


Hop 
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Definition 22 mutual information 


The mutual information is 


I(X;Y) = H(X) — H(X/Y) 


It represents the information provided about X by Y. 
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Example 15 mutual information 


Alternatively, the mutual information may take the following form, cf Defini- 
tion 14, 


p(aly) 
p(x) 


= Lele, y)log Pt) _F ple ying tH 


I(X;Y) = 2 Pte, y)log © 


p(x)p(y) p(y) 


That is to say, U(X: ai is the average taken over the X, Y sample space of the 
random variable I(z;y) such that, 
p(zly) p(z,y) p(y|x) 
I(x; y) = log = log = log 
p(2) p(x)p(y) p(y) 
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Theorem 11 mutual information 


For any discrete random variables X and Y, 
I(X;Y) >0 


Moreover, 
I(X; Y) =0 


if and only if X and Y are independent. 
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Proof. From one of our formulae for the mutual information and from 
Jensen’s inequality, 


p(x)p(y) 
I(X;Y) = pa ercriy Gu 


> log x p(x)p(y) = log 1 = 0 


bar 


Furthermore, the equality sign holds if and only if 


p(x)p(y) = p(z, y) 


for all x and y, that is to say, when X and Y are independent of each other. 
q 
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Example 16 mutual information in various forms 


From our formulae of the mutual information, we may see that, 


I(X;Y) = I(Y;X) 
and 
I(X; Y) = H(Y) — H(Y|X) 
Also, 


I(X;Y) = 2 vte y) log ——~ ra 7) 
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Definition 23 mutual information for three random variables 


Let X, Y and Z be three random variables. Then the mutual information 
I(X, Y; Z) is given by, 


I(X, Y;Z) =E (108 eas = ¥ ple,y,z)log nee 


LiY,z 
This mutual information is the amount of information X and Y provide about 


Z. 
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Theorem 12 mutual information for three random variables 
Let X, Y and Z be three random variables. Then we have 


I(X, ¥;Z) > 1(Y;Z) 
where the equality holds if and only if 
p(z|x,y) = p(zly) 


for all (x,y,z) such that 
p(x,y,2) >0 
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pee (lv) _ |, pele.) 
PYZIY PIZ|L,Y 
I(Y;Z) —I(X, Y; Z) = E { log ~ log PER) 
(¥:2) ~18¥;2) = B (log De) 
p(z|y) ) p(z|y) 
=E log = p(x, y, z) log 
( Glew) 7 2 pele 9) 


Then using Jensen’s inequality, we have, 


I(Y;Z) — U(X, ¥;2Z) <log YS ple, y,2) Pe 


2, PH) Gre, 9) 
=log S> p(x, y)p(zly) = log 1 = 0 
LiY,% 


q 
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Theorem 13 Markov’s chain 
Let (X,Y, Z) be a Markov chain. Then, 


1x) < [1080 
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Proof. From Theorem 12, 


I(X;Z) < 1(X, Y;Z) 
Because (X,Y, Z) is a Markov chain, 
I(X, Y;Z) = I(Y;Z) 
Therefore 
I(X;Z) <I(¥;Z) 


Next, since (X,Y,Z) is a Markov chain, (Z, Y,X) is also a Markov chain. 
Hence 
I(X;Z) < I(X;Y) 


q 
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Definition 24 group 


A group is a non-empty set G together with an operation, called multiplication, 
which associates with each ordered pair x, y of elements in G a third element, 
their product, in G such that, 


1. multiplication is associative; 
2. there exists an identity element e in G; and 
3. for each element x in G there exists an inverse of x. 
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In other words, for x and y in G there exists xy in G such that, 


1. for any z, y and z in G, x(yz) = (ry)z; 
2. there exists e in G such that xe = ex = x; and 
3. to each x in G there corresponds z~! in G such that «2—! = ata =e. 


A group is called Abelian or commutative group if 
xy = yx 


for all elements x and y in G. The group G is called a finite group if it consists 
of a finite number of elements, otherwise it is called an infinite group. This 
number of elements of G is called its order. 
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Theorem 14 uniqueness of an identity 


Both the identity e and the inverse x~1 of a group G are unique. 
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Proof. Suppose e°0 is another element in G such that 

se0= e'0rx =a 
for every x in G, then 

e°0 = e°0e =e 


hence the identity element is unique. Suppose for every x in G, that 7°0 be 
another element in G such that 


cx°0 = 2°02 =e 
then, 
2°0 = #°0e = #°0(aa~*) = (2° 02)a~t = ex + = 27 


hence the inverse element of G is unique. 
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Definition 25 ring 
A ring is an additive Abelian group R which is closed under a second opera- 
tion, called multiplication, in such a manner that, 


1. multiplication is associative; and 
2. multiplication is distributive. 
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That is to say, if x, y and z are any three elements in R, then, 

1. x(yz) = (xy)z; and 

2. a(ytz)=acytaz and (x+y)z=xz+ yz. 


A ring is called a commutative ring if 
LY = yx 


for all elements x and y in R. If a ring R has a non-zero element 1 with such 
a property that 
tl=lr=2 


for every x, then 1 is called an identity element, and R is said to be a ring 
with identity. 
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Definition 26 regular- and singular elements 


Let x be an element of R, a ring with identity. Then x is said to be regular 
if its inverse z—1 exists, otherwise it is said to be singular. Regular elements 
are also called invertible- or non-singular elements. Furthermore, F is called 
a division ring if all its non-zero elements are regular. 

§ 


Definition 27 field 


A field is a commutative division ring. 
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Example 17 field 


A field, then, is a non-empty set F' together with two operations on its ele- 
ments, namely addition and multiplication, such that for all a, b and c in F, 
under addition, F' is closed, commutative, associative, has a unique identity, 
has for each of its elements a unique inverse; and under multiplication, F' is 
closed, commutative, associative, has a unique identity, has for each of its 
elements a unique inverse. Furturemore, F' is also distributive. 

These properties of field are inherited from the latter’s progenitors, since the 
field is defined by the division ring which itself is defined by the ring which 
itself is defined by the group. 
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Table 1 shows the sources from which each of the properties of the field is defined. 
operator property defining definition 

addition closed group 
commutative Abelian group 
associative group 
identity group 
inverse group 
multiplication closed ring 
commutative commutative ring 
associative ring 
identity ring with identity 
inverse division ring 


Table 1 The various sources at the places of which the various properties of 
the field are defined. 
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Theorem 15 negative 
Consider any two elements a and 0b in a field F’, we have 
(-1)-a=-a 
Proof. Since, 
(-1)-a+a=(-1)-a+a-1=((-1)+1)-a=0-a=0 


and since a + (—a) = 0, therefore (—1)-a=—a. q 
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Theorem 16 zeros 
Let a and b be any two elements in a field F. Then 


ab = Oimplies a = 0orb = 0 


Proof. If a 0, then, 
0=a'-0=a '(ab) =(a‘a)b=1-b=b-1=b 


And since a and 6b are arbitrary, and since ab = ba, our statement above is 
proved. q 
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Definition 28 modulo 


Let a, b and m be integers, and let m > 1. Then a is said to be congruent to 
b modulo m, in other words, 


a = b(modm) 
if m|(a — b), that is to say, m divides a — b. The number m is called the 
modulus, and b is called the residue of a(modm). Sometimes b is also called 
the principal remainder of a divided by m, and denoted by 
(a(mod m)) 
A residue is said to be common if 0 <b< m. 
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Theorem 17 congruence 


Any integer a is congruent to exactly one of 0, 1, ..., m —1 modulo m. 
Proof. Let a and m be integers, and let m > 1. Then there exists a unique 
k such that a = mk + b, where 0 < b < m-—1. Therefore b is uniquely 
determined by m and a. 

To prove that b is unique, suppose there exist a = mk, +b; and a = mkp + be, 
where 0 < Bb} < m—1 and 0 < by < m-—1z, such that b; 4 by. Then, 
a—mk, #a— mp2, and since m > 1, therefore ki # ky. Since k, and kg are 
arbitrary, let ky > ka and let ky = kp +n. Then, 


mky + bg =a=m(ke +n) + by = mk2 +b, +mn 


and since b; > 0, m > 0 and n > 0, we have bz > m, which contradicts what 
we have said earlier, that is by <m—1. So, necessarily b; = by. q 
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Theorem 18 properties of modulo 


Let a, b and ™ are integers, and let m > 1. Then the following properties hold for 
congruence. 


a = b(mod 0) implies a = b 

either a = b(mod m) or a $ b(modm) 

a = a(modm) 

a = b(modm) implies b = a(modm) 

if a = b(mod™m) and b = c(mod™m), then a = c(modm) 


Let a = b(modm) and c= d(modm). Then, 


eo Tf’ 


f. atc=b+d(modm) 

g. a—c=b—d(modm) 

h. ac = bd(modm) 
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Further, let k and n be integers. Then, 


i. if a = b(modm), then ka = kb(mod m) 
j. if a = b(modm), then a” = b"(modm) 
k. if a = b(modm,) and a = b(mod mg), then, 


a = b(modIem(m1, mz)) 


where lcm(z, y) is the least common multiple of x and y, that is the smallest 
z such that there exist positive integers p and q by which 


pr =qy = 2 
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1. if a* = b*(modm), then, 


From above properties, it follows that, 


m. if a = b(modm), then 


P(a) = P(b)(modm) 


where P(x) is a polynomial. 


Properties (a) is called equivalence, (b) determination, (c) reflexive, (d) sym- 
metry, and (e) transition. 
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Definition 29 set of integer modulo m 


We denote by Zm, or Z/(m) the set {0,...,m—1}, where m > 1, and define 
the addition and multiplication on it as, 


a® b= (a+ b(modm)) 
and 


a © b = (ab(modm)) 


respectively, and these may be denoted as a + b and respectively ab for sim- 
plicity. 
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Example 18 ring of integer modulo m 


The set Z together with addition and multiplication introduced in Definition 
29 form a ring. 
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Theorem 19 condition when Zn, is a field 
The ring Z,, is a field if and only if m is prime. 


Proof. First we prove that m being prime implies that Zm is a field. Let m 
be a prime. Then any a # 0 in Zp, in other words 0 < a < m, is prime relative 
to m. Therefore, there exist two integers u and v, where 0 < u < m—1, such 
that ua + vm = 1, which means that ua = 1(modm). Hence u = a~', and 
since this applies for every a in Zm, it follows that Z,, is a field. 


Next we will prove that if m is not a prime, then Z,, is no field. Suppose 
that m is not a prime. Then m = ab for some a and b, where 1 << a<m 
and 1<6<™m. But ab = 0 is in Z,,, and therefore a = 0 and b = 0. This 
contradicts the values of a and 6 given above, thus Z,, is no field. q 
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Definition 30 notation for rings 
We denote by na the element 


n 
Dt 
i=l 


for any element a in a ring R and an integer n > 1. 


Definition 31 characteristic of a field 


Let F be a field. Then the characteristic of F is the least positive integer p 
such that p-1 = 0, where 1 is the multiplicative identity of F. Where no such 
p exists, this characteristic is defined to be zero. 

By F* we mean F'\ {0}. 
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Theorem 20 characteristic of a field 
The characteristics of a field is either zero or a prime number. 
Proof. Consider a field F. Since 1-1 = 1 ¥ 0, therefore 1 is not the 


characteristic of F. Let the characteristic be p = mn, where 1 < n < p and 
l<m<p.Ifa=m-1andb=n-1, then, 


e-b=(m-non-3) = (oa) sl mn-1=p-1=0 


This implies a = 0 and b = 0, which contradicts what we had assumed when 
we started. q 
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Definition 32 subfield 


Let & and F be two fields, and let F be a subset of E. Then F is called a 
subfield of E if the addition and multiplication of E, when restricted to F, 
are the same as those of F. 
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Theorem 21 elements of finite field 
A finite field F of characteristic p contains p” elements for some integer n > 1. 
Proof. Choose an element a; from F*. Then 
0-a1,.-.,(p—1)-a 
are pairwisely distinct from one another, for if 
t-a, =Jj-ay 

for some 

O<i<j<p-1 


then (j —7)-a, = 0. Since p is the characteristic of F', by Theorem 20 p can 
be either zero or prime. And since 0 < j —i < p—1, therefore j —i = 0, that 
isi=j. q 
Coding theory, Group, field and finite field, 11° November 2005 -23- From 25%” 
October 2005 , as of 14°” January, 2007 


Kit Tyabandha, PhD Department of Mathematics, Mahidol University 
Next, if 
F\{0-a1,...,(p—1)- ai} 
is not empty we choose from it a2. Then 
a1 Qy + a2Q2 
are pairwise distinct for all 0 < a,,a2 < p—1, for if aja; + aga for some 
0 < a4, 42,b1,b2 <p-—1, then necessarily ag = bo because otherwise, 
ay — by 
a2 = —— a} 
by — ag 
which contradicts the way we have chosen a2. Then it follows that 
(a1, a2) = (b1, be) 
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Since F is finite, we may continue in this fashion to ag, a4, and so on until ap for 
some integer n, and find aj, for all 2 <j <n, from 


g-1 
F\{y> aia} 

i=1 
where ai,i=1,...,j —1, are in Zp. 
In the end, 

n 
F= .e aia} 
i=1 
where @1,...,@ are in Z,. In the same manner as above, we may show that 
aiai +... +@nan 

are pairwisely distinct from each other for all a; in Z», where? = 1,...,n. Therefore 


|F| = p" 
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Definition 33 polynomial ring 


Let F be a field. Then the set, 
Fiz] = {Seoul 


i=0 
where a; is an element in F' and n > 0, is called the polynomial ring over F. An 
element of Fz] is called a polynomial over F. For a polynomial 

f(x) = lai’ 

i=0 

providing that an # 0, the integer n is called the degree of f(x), denoted by 
deg(f(x)). We define deg(0) = —oo. A nonzero polynomial f(x) of degree n is said 
to be monic if a, = 1. Furthermore, a polynomial f(x) is said to be reducible over F’ 
if there exist two polynomials g(x) and h(x) over F' such that deg(g(ax)) < deg(f(x)) 
and deg(h(x)) < deg(f(x)), and f(x) = g(x)h(x). A polynomial is said to be irre- 
ducible over F if it is not reducible. 
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Definition 34 remainder of polynomial ring 


Let f(x) in F[az] be a polynomial of degree n > 1. Then, for any polynomial 
g(x) in F[z] there exists a unique pair (s(x),r(ax)) of polynomials, where 


deg(r(x)) < deg(f(«)) 


or r(x) = 0, such that 
g(x) = s(x) f(@) + r(a) 


Here r(x) is called the principal remainder of g(a) divided by f(x), or in our 
notation 
(g(x) (mod f(x))) 
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Definition 35 the greatest common divisor and the least common mul- 

tiple 
Let f(x) and g(x) in Fa] be two nonzero polynomials. The greatest common 
divisor of f(x) and g(x), written gcd(f(x), g(x)), is the monic polynomial 
of the highest degree which is a divisor of both f(x) and g(x). Two poly- 
nomials f(x) and g(x) are said to be co-prime, or prime, to each other if 
gcd(f (x), g(v)) = 1. The least common multiple of f(x) and g(x), namely 
Icm(f (az), g(x)), is the monic polynomial of the lowest degree which is a mul- 
tiple of both f(x) and g(a). 
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Example 19 factorisation 
Let the factorisations of two polynomials f(x) and g(x) are, 


f(x) = a: (pi(x))* +++ (Pn (x) 
and 

g(x) = b- (pr (x))™ -+- (Pn (x))™ 
where a and bare in F*, and e;,d; > 0, and p;(x) are distinct monic irreducible 
polynomials, then, 


gcd( f(x), g(x)) = (pi (a) )min(e1,41) ater (Dn (ar) )min(en dn) 
and 
Iem(f (x), g()) = (pi (a) )m*(er dr) sat (Dn (a2) Cen 4m) 
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Example 20 greatest common divisor 


Let f(x) and g(x) in F'[z] be two nonzero polynomials. Then, there exist two 
polynomials u(x) and v(z) having 


deg(u(x)) < deg(g(x)) 
and 
deg(u(x)) < deg(f(x)) 
such that, 
gcd(f (x), 9(@)) = u(x) f(x) + v(x) g(a) 
Then, 
gcd(f(x)h(x), 9(x)) = gcd( f(x), 9(2)) 
if gcd(h(x), g(x)) = 1. 
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Theorem 22 rings of polynomial 


Let f(x) be a polynomial of degree n over a field F, where n > 1. Then 
Fa|/(f(a)), together with the addition, 


g(x) ® h(x) = (g(x) + h(a)(mod f(«))) 
also written g(x) + h(x), and multiplication, 
g(2) © h(@) = (g(a)h(x)(mod f(x))) 


also written g(x) - h(x), form a ring. Furthermore, F'[x]/(f(x)) is a field if 
and only if f(x) is irreducible. 


Coding theory, Group, field and finite field, 11** November 2005 -31— From 25% 
October 2005 , as of 14*” January, 2007 


Kit Tyabandha, PhD Department of Mathematics, Mahidol University 
Example 21 rings of polynomial 


Consider the ring Z2[x]/(1+ 2?) = {0,1,2,1+4 2} Its addition and multiplication 
tables are shown in Table 2. 


+ 0 1 x (1+2) 
0 0 1 x l+«az 
1 1 0 l+z « 
x x l+a 0 1 
l+a1+2 ¢ 1 0 
x 0 oil x 1+<a2 
0 0 0 0 0 
1 0 1 x 1l+za2 
x 0 « 1 l+a 


l+a 0 l1+a 1+2 0 
Table 2 Addition and multiplication tables for Zo[x]/(1 + x”). 
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Example 22 


Consider the ring Z2[r]/(1+2+27). Its addition and multiplication tables are given 
in Table 3. 


+ 0 1 x 1l+¢a 
0 0 1 x l+¢z 
1 1 0 l+az « 
x x l+a 0 1 
l+a l+n2 ¢ 1 0 

x 0 il x 1l+z2 
0 0 0 0 0 

1 0 1 x 1+z2 
x 0 « l+a 1 
l+a 0 1+2 1 x 


Table 3 Addition and multiplication tables for Zo[x]/(1+ «2+ 27). 
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Example 23 analogies between Z and F|z| 
Table 4 shows the analogies between Z and Fs]. 


the integral ring Z the polynomial ring F[2] 

an integer m a polynomial f(x) 

a prime number p an irreducible polynomial p(z) 

Zm = {0,...,.m—1} Fle]/(f(2)) = (Oro aint ai € Fyn > 1} 
a ®b= (a+ b(modm)) g(x) ® h(x) = (g(x) ® h(x)(mod f(x))) 

a © b = (ab(mod m)) g(x) © h(x) = (g(x)h(x)(mod f(x))) 

Zm is a ring F[x|/(f(x)) is a ring 

Zm isafield << misaprime F'|z]/(f(x)) is a field © f(x) is irreducible 


Table 4 Analogies between Z and F[z]. 
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Theorem 23 finite fields 
For every element ¢ of a finite field F with n elements, é” = ¢. 
Proof. The case when ¢ = 0 is trivial. Next, if ¢ #0, then we could list all 
the nonzero elements of F' as 
F* = {$1,.--,¢n-1} 


And since F is closed, we could multiply each element in F* to obtain 


F* = {o¢1, oe -, Pon—1} 
Therefore 
b1°*+Gn—1 = (b¢1) +++ (bbn—-1) 
which leads to 
get ="74 
q 
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Corollary 23[1] finite fields 
Let F' be a subfield of E, and let |F| =n. Then an element ¢ of E is also in 
F if and only if ¢” = ¢. 
Proof. The if part was already proved in Theorem 23. For the only if part, 


if ¢ satisfy ¢” = ¢, then it is a root of x” — a. And since |F| = n means that 
all the elements of F are roots of x” — a, it follows that ¢ lies in F. q 
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Definition 36 finite field 


We denote a finite field with g elements by F, or GF(q). Let a be a root of 
an irreducible polynomial f(x) of degree n over a field F’. Then, if we replace 
x in F[a]/(f(x)) by a, the field F[z]/(f(z)) can be represented as, 


n—-1 : 
Fla] = » aa} 
i=0 
for a; in F. 
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Definition 37 primitive element 


An element a in a finite field Fy is called a primitive element, or generator, 
of F, if 
F, = {0,a,07,...,a7-"} 


Definition 38 order 


The order, ord(a), of a nonzero element a in F, is the smallest positive integer 
k such that a* = 1. 


§ 
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Definition 39 prime power 


A prime power is a prime or an integer power of a prime. 


Example 24 prime power 


Examples of prime powers are, 


2,3, 4,5, 7,8, 9, 11,13, 16,17,... 
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Definition 40 linear code 


Let the alphabet be F,, in other words a Galois field GF(q), where q is a 
prime power, and let the vector space V(n,q) be (Fy)”. Then a linear code 
over GF'(q), for some positive integer n, is a subspace of V(n, q). 
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Theorem 24 linear code 
A subset C' of V(n, q) is a linear code if and only if, 
a.u+v€C forall uandvinC 
b. au € C for all u€ C and a € GF(q) 


Proof. The proof follows from Definition 40 since, if C' is a field, it must be 
closed under addition and multiplication. q 


Coding theory, Bounds in coding, 18" November 2005 -3- From 15" November 
2005 , as of 14” January, 2007 


Kit Tyabandha, PhD Department of Mathematics, Mahidol University 


Example 25 linear binary code 


A binary code is linear if and only if the sum of any two code words is a code 
word. 
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Definition 41 vector space 


A vector space V is a set which is closed under finite vector addition and scalar 
multiplication. If the scalars are members of a field F’, then V is called a vector 
space under F’. Furthermore, V is a vector space under F if and only if for all 
members of V and F the following properties hold under addition, 

a. commutativity 

b. associativity 

c. existence of an identity 

d. existence of an inverse 
while under multiplication the following, 

e. associativity under scalar multiplication 

f. distributivity of scalar sum 

g. distributivity of vector sum 

h. existence of a scalar multiplication identity 
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In other words, for all x, y and z in V and all p and q in F, 
_x+y=ytx 

_(x+y)+z=x+(y +z) 

.04+x=x+0=x 

. x+(-x) =0 

. 7(sx) = (rs)x 

(r+s)x =rx+ sx 

.r(xty)=rx+ry 

lx =x 


Te mroeaan7e 
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Example 26 vector space over finite field 


Let g be a prime power, and let GF (gq) denote a finite field over g elements. 
Then, by vector space over finite field we mean a set GF(q)” of all ordered 
n-tuples over GF'(q), which is closed under finite vector addition and multi- 
plication, that is to say, multiplication by some scalar a in GF(q). 


Coding theory, Bounds in coding, 18" November 2005 -7—- From 15" November 
2005 , as of 14” January, 2007 


Kit Tyabandha, PhD Department of Mathematics, Mahidol University 


Theorem 25 vector space 


A non-empty subset C' of V(n, q) is a subspace if and only if C is closed under 
addition and scalar multiplication. In other words 


Proof. What Theorem 25 states amounts to saying that a non-empty C in 
V(n,q) is a subspace if and only if, 

a. x,y € Cimpliesx+yeEC 

b. ifa € GF(q) and x € C, then ax €C 
All properties to be met in Definition 41 are the same for C' as for V(n,q) 
itself, provided that C is closed under addition and scalar multiplication. 
Therefore statements (a) and (b) are necessary for C to be a subspace. They 
are also sufficient since C is already a subset of V(n, q). q 
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Definition 42 linearly independence 
A linear combination of r vectors v1,...,V, in V(n,q) is any vector of the 
form es a;v;, where a; are scalars. Let A be a set of vectors {v1,..., vr}. 
Then A is said to be linearly dependent if there exist scalars a,,...,a, not all 
of which are zero, such that 


Z 
S- ayvi = 0 
i=1 
And A is linearly independent if it is not linearly dependent, that is to say, if 
Yo i-1 4Vi = 0 implies a; are all zero for i= 1,...,r. 
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Definition 43 generating set and basis 


Let C be a subspace of a vector space V(n,q) over GF'(q). Then a subset 
{vi,..., vr} of C is called a generating- or spanning set of C if every vector 
in C can be expressed as a linear combination of v1,...,Vr- 


A basis of C is a generating set of the same which is also linearly independent. 
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Definition 44 relative minimum distance 


For a g-ary (n,m, d)-code C, the relative minimum distance of C’ is defined 
to be 4 
6(C) = —— 
(0) = 


Coding theory, Bounds in coding, 18** November 2005 —11— From 15" November 
2005 , as of 14” January, 2007 


Kit Tyabandha, PhD Department of Mathematics, Mahidol University 


Definition 45 optimal code 
Let a code alphabet A be of size gq > 1, n the size of each word, d the 
minimum distance, and A,(n,d) the largest possible vocabulary size m such 
that there exists an (n,m, d)-code over A. Then any (n,m, d)-code C' which 
has m = A,(n,d) is called an optimal code. 


The main coding theory problem is precisely to find the value of A,(n,d). 
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Definition 46 Hamming sphere 
Consider each word as an n-tuple. Then all such tuples lying within Hamming 
distance r of an n-tuple x are said to be within a Hamming sphere of radius 
r around z. 
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Theorem 26 Hamming bound 
Let the size of the alphabet be g = |A|, the size of a word be n, and the 
Hamming- or minimum distance be d. Then the Hamming- or sphere-packing 
bound on the size m of a code dictionary C' is given by, 


where 


Coding theory, Bounds in coding, 18" November 2005 —14~— From 15" November 
2005 , as of 14” January, 2007 


God’s Ayudhya’s Defence 14 January, 2007 168 


Coding Theory, notes and projections from lecture Kit Tyabandha, PhD 


Kit Tyabandha, PhD Department of Mathematics, Mahidol University 


Proof. Let c be a code word. Let e(z, y) be the number of places which are 
different between two words x and y. Since there are q—1 possibilities for each 
differing position between any two words, there are (q — 1)’ possible errors 
when i places are different. And to position these i places there are altogether 


ways. Therefore the number of all words w; such that e(wi,c) < r is 


the number n, of n-tuples in a Hamming sphere of radius r around c, and is, 


m= Dad} 
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Then the lower bound for our code is d(C) > 2r, that is to say, d(C) > 2r+1. 
In other words, Hamming spheres of radius r around the m code words of C 
are mutually nonintersecting. There are a total of g” possible n-tuples, that 
is words of length n, not all of which are code words. In other words, m < q”. 
And since there are n, of these n-tuples within each sphere, the the number 
of the all the n-tuples contained within the space of all these n-tuples over 
the alphabet A is n,m. Hence, 


my -1) @ <q 


and thus this theorem. q 
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Definition 47 perfect code 
Codes which satisfies the Hamming bound are called perfect codes. 
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Problem 1 Stirling’s approximation of n! 


Let r and n be integers such that 0 <r < 4, then prove that, 


n(Z) (1-2) ) tare et-8) <2 (®) < gne(et-8) 
ow (£) 2) tame) <3" (") <2 


where H(x,y) is the entropy function the arguments x and y of which are 
probabilities and H(-,-) has the unit of bits per symbol. (Hint: Stirling’s 
approximation to n!, cf MacWilliams and Sloane, 1977) 
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Note 1 dictionary’s size 
Let C(n, d) be a code with words of length n and minimum distance between 


words d. Let mn,q be the number of code words in C(n, d). Then the size of 
the largest dictionary of n-tuples with fractional minimum distance dy is, 


™m ,d = Cc ,d 
molt dy) = 9 max (nd) 
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Problem 2 
From Note 1, show that for n fixed, mm(n, dy) is a monotonous nonincreasing 
function of dy. Then show that with dy fixed, mm(n,dy) increases exponen- 
tially with n. 
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Definition 48 asymptotic transmission rate 


The asymptotic transmission rate is defined to be, 
2 1 
R(dy) = Jim | log mm(n, dp) 
Also defined are the upper- and the lower bounds on this rate, 
2 : 1 
R(dyz) = limsup — logm,(n, dy) 
nooo 1 


and ; 
R(ds) = lim inf — logmm(n, ds) 
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Note 2 bounds on rate 
For large n, show that R(d;) < R(ds) < R(dy,). 
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Example 27 Hamming bound for binary codes 
Using the results from Problem 1 we obtain the Hamming bound for the 


binary code, . 
ae (sn(=) (1 = “)) ? on(1-H(#.1-5)) (16) 


where 
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Equation 16 must hold for all binary dictionaries, therefore it gives an upper 
bound on the maximum dictionary size m,,(n, dy) over all dictionaries whose 
word length is n and fractional distance, 


1 
d ar {>} 
dy = — = ————+ 


n nr 


where the choice of 1 or 2 depends on whether d is odd or respectively even. 
For large n, 


rantdy) < (an( 2) (1&2) )"ant-a4) 
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The upper bound for the attainable information rate is, 
= ; 1 
R(dyz) = limsup — logy mm (n, dy) 
noo 1 


: 1 logy n 1 9d; dy dy dy 
< = = SOF (4 — SF = ie, peer 
< jim. { 5 n + xolog, (* 2 sa oe 


As n approaches infinity, 


A dys dy 
Pe abe as pect 
R(ds) <1 ($a ) 
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Theorem 27 Plotkin’s bound 
Let d(c,cj) be the Hamming distance between the code words c; and cj. 
Let d(C) be the minimum distance between code words, and d the average 
distance between words. If, 


d_q-1l 
n qd 
then the Plotkin’s bound, 
d 
Mn,d < d es 
ng 
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Proof. The average distance gives an upper bound for the minimum dis- 
tance, that is d < d, where 


ee) 
a See 


~ EE a 


j=2 j=l 


d= 


Since the Plotkin’s bound is an upper bound on d, we need to maximise, 


n 


YG d(ciz, cx) = >, >_> 4 (ci, efx) 


ij iD>j k= k=l ij 
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This implies (cf Plotkin, 1960), 
n 
Des cj) < dX eae ddd Cjk) 


which says that the upper bound is maximised by choosing a maximising cj, 
from the alphabet A. However this is, 


2 
m\~ q(q-1) 
, age Yee f ae aes 
mex, DD denex) s (%) ME 


i>j 
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Providing that, 


then 
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Note 3 double summations 


Notice how, 
m-1 m m t-1 
~ EY O=VLDNO 
i=1 j=i+1 i=2 j=l 


Equivalently to this are, 


Seo a YYO 


1<j i>j 
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Note 4 Plotkin’s bound 


If, 
q-1 
d; > —— 
e qd 
then, 
dy 
mm(n,ds) << —“L 
d; ~ (44) 
and then, 


= 1 
R(d;) = limsup —logm,,(n, dy) = 0 
noo 1 
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On the other hand if, 


then from, 
m(n,d) = S- ma(n, d) 
acA 
where m(n, d) = |C(n, d)|, C(n, d) being any code consisting of n-tuples whose 
minimum distance is at least d, and m,(n, d) = |C,(n, d)|, C,(n, d) comprising 
all n-tuples in C(n, d) which begin with the symbol x. Hence, 
m(n, d) < gm,(n, d) = qm(n — 1,d) 


= gq" *m(k, d) 
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Provided & is small enough, we may yet use the Plotkin’s bound, hence 
n—k (da 
a eae a 
(A= (3) 
k q 
when 
d_q-l1 
k q 


Choose & the largest integer satisfying 


d 


1 -1 
aoa Gee 

k qk q 
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Then, 
d-1 
ktr= 
q-1 


where 0 <r <1. And then, 

qd—1 
gh Sr) 

d) << ——————_ 
eS re ara) 
Finally, 

qd—1 
m(n,d) <q” Crd 


and, if dy is fixed and n become large, 
Ray) < toga (1- ay) 
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Problem 5 upper bound 


Prove that, 
q (@=Dr+i) <1 


forO<r<l1 
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Proposition 1 Elias bound for binary alphabet 
Let C be a code containing binary n-tuples, mg(x) the number of code words 
within distance d of an n-tuple x. Further, let A be a new code whose 
code words are the difference vectors a1,.-.,@m, such that a; = ¢ © 2, 
1 =1,...,ma, where © denotes modulo subtraction of the vectors, element 
by element. Assume that d < } and both d and m are large enough such 
that mq(x) > 2. Then, 


de Z 2d (1 iz “) Ma (17) 


mee rd (1) 
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Proof. Since C is a code of binary n-tuples, there are 


() 


n-tuples within distance d of each code word. This gives the total of 


d 
n 
i=0 
n-tuples in the Hamming sphere around the m code words. 


There are mq(x) code words within the distance d of any n-tuple x. For z in X”,c 
in C and d(xz,c) < d, the number of pairs (#,c) can be counted by picking up first 


xz and then c, hence 
d 
Y matey=my> (7) 
i=0 


LExnr 
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Since X” contains 2” of n-tuples, consequently there exists some value of x 


such that, 
d 
ma(x) > leemy (")] 
i=0 


Let c1,...,Cm, be code words in C' that lie within Hamming distance d of the 
n-tuple z. Consider the difference vector a1,...,@m, such that a; = cq Oz. 
Then A is a set of localised code words of C’. Then, 


a Cay =(G4Or) O(G Or) =46G; 


and we have, 
d(ci, c;) = d(ai, aj) 
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Thus, 
d 
Ma > ma(x) > lend (")] 


Also, dg > de and w(a:) < d for all n-tuple a; in A, where the Hamming weight 
w(ai) is the number of nonzero elements in aj. 
Next, applying the average-distance Plotkin bound to the localised code A one 


obtains, 
de <dg <dg = (zemeany dd Alas as) (18) 


We maximise RHS of Equation 18 to get rid of the dependence on A. We enlarge 
our restriction on w(a;) above to the set of all possible a; in A, thus, 


S- Ww (ai) < Mad (19) 
ayEA 
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Then, let z, be the number of code words in A having a 0 in the k™ position. 


We maximise, 
Dd 4 (ai, a3) = Sm — 2) (20) 


i>j 
subject to the constraint of Equation 19 <n 
n 
S- (ma — 2h) < Mad (21) 
k=1 
By setting, 
Mad 
= 22 
oe (22) 
we maximise RHS of Equation 20 under the constraint in Equation 21. From 
Equation’s 18, 20 and 22 we obtain Equation 17. q 
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Algorithm 2 Gilbert bound, a lower bound to m for n, d and q. 


Sr xX” 
for all c; in S” do 
for all n-tuples c; within d—1 distance of C do 
remove C; 
endfor 
endfor 
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Note 5 Gilbert bound 
For the Gilbert bound algorithm, Algorithm 2, initially |S|" = |X|". For each 


c; chosen, at most 
d-1 
Ya-v' (7) 
i=0 


n-tuples are removed. If 


im 9) (ae 


then the algorithm will not stop after m — 1 code-word selections. 
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Definition 49 Group, ring and field 


A non-empty set G with a binary composition is called a group if the com- 
position is associative, if a unique identity exists for all elements in G, and if 
a unique inverse exists for each of the elements in G. The group G is called 
Abelian if the composition in it is commutative for any two elements in G. 
A non-empty set R with two binary compositions, call these addition and 
multiplication, defined on it is called a ring if R is an Abelian group with 
respect to the composition addition, if multiplication in R is associative, and 
if distributive laws hold for all elements in R. A set F having at least two 
elements with two compositions, be them called addition and multiplication, 
defined on it is called a field if it is a commutative ring with identity every 
non-zero element of which has an inverse with respect to multiplication. A 
field having only a finite number of elements is called a finite or Galois field. 
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Example 28 finite field 


The set 
F, = {0,...,p—1} 


in which addition and multiplication are defined modulo p, where p is a prime 
integer, is a finite field. For p = 2 we have Fy = {0,1}, which is denoted by 
B. The set B” of all ordered n-tuples or sequences of length n, a positive 
integer, with each tuple or entry of the sequence being in the field B and a 
composition defined as a componentwise summation of any two sequences in 
B”, is an Abelian group. The zero sequence of length n is the identity of B” 
and each element in B” is its own inverse. 
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Definition 50 code words 


A binary block (b,n)-code comprises an encoding function 
E:B’ +B" 


and a decoding function 
D:B" +B? 


The images of & are called code words. 
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Definition 51 distance function 


Let two binary sequences be a and b in B”. The distance d(a, b) between a 
and b is defined as es 
d(a,b) = oe Ly 
i=1 


where 


seas 0 if a; = 5; 
ee 1 if a; A bj 


Definition 52 weight function 


The weight w(a) of a in B” is the number of non-zero components of the 
sequence a. 
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Theorem 28 weight and distance 
Let a and b be any two sequences in B”. Then d(a,b) = w(a + 6). 
Proof. The only contribution of 1 to d(a,b) is a; # b; for all 1 <i <n. 


But this latter is the case if and only if a; + b; = 1, and this contributes 1 to 
w(a tb). q 
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Definition 53 homomorphism 
Let X and Y be two groups. Then a map 


f:X3Y 
which satisfies the property 


f (1x2) = f(x1)f (#2) 


for all x; and x2 in X is called a homomorphism. Further, the homomorphism 
f is called a monomorphism if it is one to one, and it is called an isomorphism 
if it is both one to one and onto. 
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Definition 54 group code 
A block code is called a group code if all its code words form an additive 
group. 
Definition 55 generator matriz 


A bx n matrix G over B, where b < n, is called an encoding- or generator 
matriz if G is of the form 
G=[hG] 


where J, is an identity matrix of dimension b and G, a b x (n — b) matrix. 
An encoding function E : B® > B” is defined by 
E(x) = «G 


for all x in B® 
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Theorem 29 monomorphic generator matrix 


The encoding function E : B’ + B” given by E(x) = 2G for all x in B®, 
where G is a b X n generator matrix, is a monomorphism. 


Proof. Both B®’ and B” are additive Abelian groups. Then for all x and y 
in B® we know that x+y is also in B® and 


E(a+y) =(a@+y)G=2G + yG = E(x) + Ely) 


Thus E is a homomorphism. Further, as the first part of G is I,, it follows 
that a part of F(a) is x itself. Therefore the matrix encoding method gives for 
each binary message word a distinct code word. In other words, the mapping 
F is one to one, which means that it is a monomorphism. q 


Coding theory, Group, polynomial, and Hamming codes, 25** November 2005 -8- 
From 8*” November 2005 , as of 14°” January, 2007 


God’s Ayudhya’s Defence 14 January, 2007 181 


Coding Theory, notes and projections from lecture Kit Tyabandha, PhD 


Kit Tyabandha, PhD Department of Mathematics, Mahidol University 
Definition 56 matriz code 
A code generated by a generating matrix is called a matrix code. 


Theorem 30 matriz code a group code 


A matrix code is a group code. 


Proof. The code words generated by £& are associative, since 
wGt (t2G + x3G) = (21G + x2G) +23G 


They have a unique identity, that is the zero b x n matrix, and each of them 
is its own inverse. q 
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Definition 57 parity check code 


An (b,b+1) parity check code is the code generated by an encoding function 
E:B? — B+! defined by 


Ea, +++ Gp) =a ***QpQo41 
where 
ae 1 if w(a) is odd 
b+1 10. if w(a) is even 
w(a) being w(a, --- ap). 
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Theorem 31 parity check code 

The (6,6+ 1) parity check code is a group code. 
Proof. Let our unencoded binary words be a = a1--:ay, b = b1---by, and c = 
c1°++ cp such that c; = a; +6; for i =1,...,6, and let the coded words of a and b be 
respectively @ = aay+1 and b = bby41. Since c is odd if and only if either a is odd 
while 6 is even or vice versa, but when this is the case we have either ap+1 = 1 and 
bo+1 = 0, or ap41 = 0 and by41 = 1. Hither way we have 

Cot1 = 1 = angi + bo41 


Next, c is even if and only if a and b are either both odd or both even. But when 
either of these is the case, then 


ap41 + be41 = 0 = co+1 


Hence ¢ is a parity-check code word. The zero word is the identity and the inverse 
of each word is that word itself. Therefore the set of all code words forms a group. 

q 
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Theorem 32 minimum distance 


The minimum distance of a group code equals the minimum of the weights 
of its non-zero code words. 


Proof. Let d,, be the minimum distance of the group code, and wm the 
minimum of the weights of the non-zero code words of the same. Then there 
exist code words a and b such that 


dm = d(a,b) = w(a+ b) > wm 
Now, wm implies that there exists a non-zero code word c such that 
Wm = w(c) = dc, 0) > dm 


Hence dy, = wm- q 
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Example 29 generator and parity check matrices 
Let the generator matrix be 


The dimension of G is b x n, which in this case is 3 x 6. Let a,aga3a4a5a¢ be 
the code word and a,a2a3 the original word, then 


(ay a2 43 4 a5 ae) = (ay ag a3) G 


and then, 
a4 =a, +42 


a5 =a, + a3 
ag =a, +a2+ a3 
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In other words, 

a, +ag +a, =0 

a, +a3 + a5 = 0 } parity check equations 


aj +ag +a3 tag =0 


These parity check equations are then, in matrix form, 


eee 
Hor 
ao) 
oor 
oro 
=e SS 
8 
lI 
Oo 
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The matrix 
11010 0 
H={1 0101 0 
11%3100id41 


is called the parity check matrix of the code. Then G = (Iz A) and H = 
(A°0 Js), where 


and 
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Example 30 


The parity check code in Definition 57 is in fact a matrix code given by the 
generator matrix 


1 0 0 1 
01 0 1 
G=]. 
0 1 1 
whose parity check matrix is the 1 x (b+1) matrix H=(1 --- 1). 
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Definition 58 syndrome 
The syndrome of a word r € B” is 


s = Hr°0 
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Algorithm 3 the syndrome decoding algorithm. 


rer Teb41 Tn 
s+ Hr°0 
ifs =0 then 
br & (11 +++ 15) 
elseif s matches the i** column of H then 
Cr © (ris + rea (ri + D)riga +++ Tn) 
b, <= (Cri Ae Crb) 
else 
at least two errors have occurred in the transmission 
endif 
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Theorem 33 parity check 


An (n —b) x 6 parity check matrix H will decode all single errors correctly if 
and only if the columns of A are distinct and non-zero. 


Proof. Suppose the i** column of H is zero, and let e be a word whose 
weight is 1 having 1 in the i‘® position and 0 elsewhere. Then for any code 
word b, we have 

H(b+e)°0 = Hb’0 + He®0 =0 
So our decoding procedure becomes D(b+ e) = b+ € and the error vector e 


goes undetected. 
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Next, suppose that the i*” and the j*® columns of H are identical. Let e* and 
e/ be words of length n with 1 in the i" and respectively j*" position and 0 
elsewhere. Then for any code word b, we have H(b+e*)°90= 


Hb°0 + H (e*)’0 = H (e’)’ 0 = Hb°0+ H (e’)"0 =H (b+e’)’0 


We are unable to decide whether the error occurred in the i*® or the j*® 
position. 
Conversely, suppose all the columns of H are distinct and non-zero. Then 
for any code word b and any error vector e of weight 1 having 1 in the i*” 
position, 


H(b+e)?0 =H (b’0 + e°0) = Hb°0 + He’) =0 + He0 
Our decoding procedure gives D(b + e) = b, therefore every single error is 
corrected. q 
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Theorem 34 generator and parity-check matrices 
If 
G=(I, A) 


is a b X n generator matrix of a code, then 
A=(A0 Th25) 
is the unique parity check matrix for the same code. If 
H=(B In-v) 
is an (n — b) x n parity check matrix, then 
GH=AL~ B70) 


is the unique generator matrix for the same code. 
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Proof. Let the original word be a € B® and c be the code word corresponding 
to a with respect to the code given by the generator matrix G. Then c = aG 
Let a be a,---a,. Since the first b columns of G is an identity matrix, it 
follows from c = aG that a; = 6; for all 1 <i <b. Let € = c41---cy, then 
C= C1-°++CCh41°°°Cn andc=(a C). Then, 


He°0 =(A90_ In-5)(aG)?0=(A90 In—») G20a°0 
= (AO In-s) (ImA)® 0a°0 = (A90  In—s) ( a) a°0 
= (A°0Im + In» A20 ) a°0 = (.A90 + A290) a°0 
=0x a0 =0 
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Therefore c is the code word corresponding to the original word a in the code given 
by the parity check matrix H. 

Now, suppose first that c is the code word corresponding to the original word a as 
above in the code obtained from the parity check matrix H =(A°0 In»). Then 
cj; = a; for all 1 <i <b and He°0 = 0. Let ¢ = cys1--+ cn. Then, 


A°0a°0 + In_»€°0 = 0. Therefore ¢ = aA, and 

c=(a ¢)=(alm aA)=a(Im A) =aG 
Hence c is the code word corresponding to the original word a in the code defined 
by the generator matrix G. So far we have proved that codes determined by G and 
H are identical. 
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Suppose that to G = (Im A) corresponds another parity check matrix Hi = 
(B In—»). Let e! be the original word with 1 in the i position and 0 elsewhere. 
The corresponding code word is e’G, that is the it* row of G, or we may write 
e'G =(e! &), where é is the i” row of A. Since Hi is a parity check matrix of 
the code defined by G, it follows that, Hi(e’ & ye 0=0, 


ote (0°) - 


B(e')*0 + (&)°0 =0 


Therefore (a')° 0 matches the i*® column of B, or equivalently &’ matches the jth 


row of B°0. Then the i‘ row of A is identical to the i” column of B. And this is 
true for all 1 <i <b, so we have B = A°0 and therefore H; = H. Hence, to a given 
G there corresponds a unique H = (A°0 In —»). Similar argument also holds if we 
start with a parity check matrix H given. q 
Coding theory, Group, polynomial, and Hamming codes, 25** November 2005 —24- 
From 8°” November 2005 , as of 14°” January, 2007 


God’s Ayudhya’s Defence 14 January, 2007 189 


Coding Theory, notes and projections from lecture Kit Tyabandha, PhD 


Kit Tyabandha, PhD Department of Mathematics, Mahidol University 
Definition 59 dual codes 
Let C be a (b,n) code obtained from the generator matrix 
G=[h Al] 
Then an (n — b,n) matrix code defined by the parity check matrix 
H=[AI,| 
is called the dual code C+ of C. 


Definition 60coset 


Two words x and y are said to be in the same coset if and only ify =a+c 


for some code word c in C. 
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Theorem 35 cosets 


Two words x and y in B” are in the same coset of C if and only if they have 
the same syndrome. 


Proof. By Definition 60 z and y are in the same coset if and ony if 
y=arurte 


for some c in C, which in turn is true if and only if «+ y= cin C. Then it 
follows from this that, H(x + y)°0 =0 


H (x°0 + y°0) =0 
Hx°0+ Hy°0=0 
Hx°0 = Hy°0 
q 
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Definition 61 vector space 
Let F be a field. Then a non-empty set V is called a vector space over F if V 
and an addition form an Abelian group; for every a in F' and v in V there is 
a uniquely defined element av in V such that for any v, v1 and v2 in V and 
any a and bin F, 
a(uy + v2) = avi + ave 


(a + b)v = av+ bv 
(ab)v = a(bv) 


and 
lv=v 


1 being the identity of F. 
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Definition 62 linear-dependence 


Let V be a vector space over a field F. Then a set {v1,...,v,} of elements 
v; in V is said to be linearly independent if 


ajU1 +°°*+anUn = 0 


for @1,...,@, in F implies aj = --- =a, = 0. A set {u1,...,Un} is called a 
basis of V if all its elements v,,...,Un in V are linearly independent over F’ 
and all elements in V may be expressed in the form a,v1 + ---+ @nUn where 
all a;, i = 1,...,n, are in F. Also V is said to be of dimension n over F, 
dimV =n. A map f : V > W from one vector space to another, where V 
and W are vector spaces over the same field F, is called an isomorphism if 
the map f one to one and onto and, for all v, v; and v2 in V and a in F, 
f (v1 + v2) = f (vr) + f (v2) and f(av) = af(v). 
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Theorem 36 isomorphic vector spaces 


Let two vector spaces V and W over the same field F' have the same finite dimension. 
Then V and W are isomorphic. 


Proof. Let dimV = dimW = n. Let {x1,...,2n} be a basis of V over F, and 
{y1,..-, Yn} a basis of W over F. 

Since all the elements of V can be uniquely written as aiz1 +---+@n%n for some 
a; in F, the map f : V > W, which is 


f (a1t1 +--+ +@n%n) = ary1 +++ +Onyn 


for a; in F, is well defined. Thus f is a homomorphism. 

Since f (a1v1 +--+ +an@n) implies a1y1+:::+@nYyn = 0 implies a1 =--- = an =0, 
which in turn implies a141 +---+an%n = 0, therefore f is one to one. Then, since all 
elements of W is of the form a1yi+---+@nYn, which is equal to f (a1%1 +--+ +Qn%n) 
for some ai,...,@n in F, therefore f is also onto. Hence f is an isomorphism. 
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Definition 63 polynomial codes 


Let 
g(x) = go +--- + gnx* 


be a polynomial in Fa]. We call the polynomial code with encoding or gener- 
ating polynomial g(a) a code which encodes each original word of the message 
a = (ao,.--, 4-1), Corresponding to 

a(x) = ag +--+ +a5-12°1 
into the code word b = (bo,-.-, 544-1), which corresponds to the code poly- 


nomial 
b(x) =bo t---4+ borp—ee te} = a(x)g(x) 
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Note 7 assumptions for polynomial codes 
We assume for our generating polynomial that go 4 0 and g, 4 0. To justify 


this assumption, suppose we have 
g(z) = g0 +--- +942" 
If go = 0, then we choose a new polynomial forg(z) as 


gi(x) = ay +--- + age? 


If gy = 0, then we choose another polynomial 


g2(t) =go+---+ ie"? 

In either case our choice becomes more economical. 

Coding theory, Group, polynomial, and Hamming codes, 25°” November 2005 -31- 
From 8°" November 2005 , as of 14°" January, 2007 


Kit Tyabandha, PhD Department of Mathematics, Mahidol University 
Theorem 37 divisible polynomial 
A polynomial with coefficients in B is divisible by 1+ 2 if and only if it has an even 
number of terms. 
Proof. Let f(x) = a9 +--:+an2” for all aj in B,i=1,...,n, and let 1+2|f(a). 
Then there exists a polynomial b(x) in B such that 
f(x) = (1+ 2)b(2) 
If « = 1, we have ao + -:-+ a, = 0. Since the field B is of characteristic 2, this is 
only possible if the number of non-zero terms is even. ; 
Conversely, let f(z) have an even number of non-zero terms, say f(«) = #"1 +--+ 
wv’2k | where 11 <--- < ion. Rewrite this as 
f(x) = (x"! + a*) sheeese (ga + ad 
For i < j, at +a) = gi (1+a/*) = 2 (1+) (1+---+27-*'), which means 
that 1+ 2|2‘ +27. Therefore 1+ « divides all bracketed terms in f(x), and hence 
14+ 2|f(x). 
Coding theory, Group, polynomial, and Hamming codes, 25** November 2005 —32— 
From 8°” November 2005 , as of 14°” January, 2007 


God’s Ayudhya’s Defence 14 January, 2007 193 


Coding Theory, notes and projections from lecture Kit Tyabandha, PhD 


Kit Tyabandha, PhD Department of Mathematics, Mahidol University 
Theorem 38 minimum distance of a polynomial code 


If g(x) € B[z] divides no polynomials of the form a* — 1 for k < n, then 
the binary polynomial code of length n generated by g(x) has the minimum 
distance of at least 3. 


Proof. Let g(z) = go + -::+ 9,2", where g; are in B, go #0 and g, # 0. 
Let b=n-—r. Suppose the opposite to what the theorem says is true. Then, 
polynomial code being a group code, there exists b(x) with at most two non- 
zero entries. There are two cases to consider, namely b(x) = z* + 2, where 
i <j, and b(z) = x‘, where i < n. In the first one of these, since n is the 
code length, we have j < n, hence 0 < j —i < n. Since g(x)|b(x) implies 
g(x)|a? (1+ 27~*), and go #0 implies x fg(x), therefore g(x)|1 + 2/~* which 
contradicts our hypothesis. In the second case, similarly to the above g(x)|x* 
and we again have a contradiction. 
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Definition 64 matriz code 
Let C be a (b,n)-code. If there exists a b x n matrix G of rank b such that 


C = {aG|a € B’} 


then G is called a generator matriz of the code C, and C is called a matrix 
code generated by G. 


Definition 65 parity check matrix 
Let C be a (b,n)-code. If there exists an (n — b) x n matrix H of rank n — b 
such that 
Hb°0 =0 


for all b in C, then H is called a parity check matriz of C. 
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Theorem 39 polynomial code a matriz code 
A polynomial code is a matrix code. 


Proof. Let C be a polynomial b,n-code with the encoding polynomial g(x) = 
goters t+ gra. Then n = b+ k. Let G be the b xX n matrix whose first row 


begins with entries go,..., 9x followed by b zeros, and whose succeeding row is an 
anticlockwise cyclic shift of the previous one, that is 
do co cee er OL Sa 0 
0 go Aone 
G=]|. 
0 aoe Gor <8 AG 


The determinant of the submatrix formed by the first b columns is non-zero, since 
go # 0 and hence g} # 0. Thus the rank of G is m. Let the original word to be 
coded be a = (ao,...,@m-—1). Then, since the code word generated by aG is the 
same as that generated from a(x)g(x), the two codes are identical. q 
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Algorithm 4 Hamming codes 


choose r a positive integer 
be 27-r-1 
n<- 27-1 
for i= 1 to 27-1 do 
(the i row of M)¢ (b,) 


endfor 
for i= 1 to 2” -—1 do 
(a1, eg aor_1) — (b,) 


(bo2-1, 22+, Dgr-2 — 1, bgr-2 +:1,..., dor-1 — 1) - (a1, aoa azr_1) 
(bg3-137 = 1,...,r) — solve (bM = 0) 
the i*® code word € (b1,.--, bn) 
endfor 
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Note 8 Hamming codes 
Each code word in a Hamming code contains 


b-n=2"-—r—-1—-2"41=r 


check digits. The value of r is called the 
redundancy 
of the code. 
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Definition 66 coset 
Let G be a group. Then a coset is a subgroup H of G which is either a 
left coset of H, that is cH = {xh:h € H} for some z in G, or a right coset 
Hz = {hz :h € H} of the same. 
Definition 67 Icm 
Let polynomials f;(«),..., f-(%) in F,[x] be non-zero. Then the least common 
multiple lem (fi(x),.-., fr(x)) of fi(x),..., f(x) is the monic polynomial of 
the lowest degree which is a multiple of all f;(x), i=1,...,r. 
Problem 6 
Prove that for non-zero polynomials f,(x),..., f-(z) in F,[z], 


lem (fi (x),.--, fr(a)) = lem (lem (fi(@),---, fr-1@))  fr(@)) 
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Note 9 
Let fi(x),..-, f-(#) in F,[z] have the factorisations, 


fi(x) = ay (pi (x))°*™ «++ (pn (x))™ 


fr (&) = Gr (p1 (x))°™ iat (pp (a))°"" 


where @),...,a@, are in FJ, ej; > 0, and p;(x) are distinct monic irreducible 
polynomials over F,, then 


len Axis ee) Pr ea 
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Theorem 40 
Let f(x), fi(z),...,fr(%) be polynomials over F,. If f(x) is divisible by every 
polynomial f;, for i=1,...,r, then f(z) is also divisible by lcm (fi(x),..., fr(x)). 
Proof. Consider first the case where there are only two different polynomials, 
fi(x) and fo(x). The prime components of f1(z) and fo(a) may be grouped 
into those which are unique among them and those which are shared. Since 
f(x) = u(x) f(z) +11 (2) 
and 
f(x) = u2(x) fo(a) + r2(a) 
it follows that f(x) contains both of these two groups of primes. In other 
words, 


f(z) = u(x) lem (fi (x), fo(x)) + r(z) 
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Next, consider the case where there are more than two f;’s. Suppose for f(z), 


that 
f(z) = ur(x) lem (fi (x), ---, fr(x)) 
Then if we let 
f(z) = lem (fi(z),.--, f(z) 


and if we introduce another polynomial f,41(«) such that 
f(a) = Urgifrgi + tr41(2) 

then following the same line of reasoning as the above we have, 
lem(fi(2),---5 fr+i(@))| f(x) 


q 
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Definition 68 subring 


A non-empty subset S$ of a ring R is called a subring of R if the elements of 
S form a ring with respect to the operations defined in R. 


Theorem 41 subring 
Let R be a ring. Then a non-empty subset S of R is a subring if and only 
if S is closed under addition, multiplication, and the formation of additive 
inverse. 
Proof. Since S is a subset of R, additive associativity, identity and com- 
mutativity are inherited to S from R. The existence of the inverse for each 
element s in S is certain provided that the formation of an additive inverse is 
guaranteed. And similarly in the case of multiplication, both associativeness 
and distributiveness hold once we know that S is closed under multiplication. 
q 
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Definition 69 ideal of a ring 


Let R be aring. We call an ideal in R a subring J having such property that 
for all i in J, then both ai and ix are also in J for every element zx in R. 
Further, if J is a proper subset of R, then it is called a proper ideal. By trivial 
ideal one means either the zero ideal {0} consisting of the zero element alone, 
or the ring R itself. 


Note 10 


The significance of the ideals in a ring is that they let us construct other rings 
from the first. The cosets of a ring R is a partition of R into equivalence sets, 
which are non-empty and disjoint, the union of which is the whole of the ring 
R. 


Coding theory, Finite field- and BCH codes, 2”? December 2005 -6— From 8%” 
November 2005 , as of 14°" January, 2007 


God’s Ayudhya’s Defence 14 January, 2007 199 


Coding Theory, notes and projections from lecture Kit Tyabandha, PhD 


Kit Tyabandha, PhD Department of Mathematics, Mahidol University 
Definition 70 congruence of a ring 
Let R be a ring and J an ideal in it. Then two elements x and y in R are said 
to be congruent modulo I, denoted by 
x = y(mod I) 
if x — y is in f. Since there is only ideal, we may a write this congruence as 
simply «= y. 
Note 12 addition and multiplication of congruences 
Congruences can be added and multiplied as if they were ordinary equations. 
In other words, if x; = x2 and y, = yo, then 
M1 +yi = 2+ y2 
and 
T1Y1 = L2Y2 
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Definition 71 coset of a ring 


Let R be a ring and let x be an element of R. Then the coset [x] containing 
x is the set of all elements y such that y = x. Then, 


[ct] ={y:ySe}=f{y:y-rel} 
={y:y—x=ifor somei € I} 
={y:y=«+ifor somei € I} 
=f{ati:teT}=ae+I 
Furthermore, [x] = [21] means that x = 21, that is to say, x — 2 isin I. Here 


x and x, are called representatives of the coset which contains them. 
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Definition 72 quotient ring 

A quotient ring, aka residue-class-, factor-, or difference ring, is a ring having 
the form of a quotient A/i of a ring A and one of its ideal i. In other words, 
the quotient ring of R with respect to I the ring 

R/T={x+I:r2€ R} 
where 

e+l={x+i:iel} 
is the coset of an element x in R, and where addition and multiplication are 
defined as, 

[2] + [y] = [e+ 9] 
and 
[x] - [y] = [ay] 
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Theorem 42 


The zero element of R/I isO+J =I, the negative of x +J is (-x) +I. If R 
is commutative, then R/I is also commutative. If R has an identity 1 and a 
proper ideal J, then R/I has an identity 1+4 I. 
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Theorem 43 quotient ring 
Let R be a ring and I an ideal of R. Then, for x and y in R, 
(@+D+yt+D=(e+y)4+1 
and 
(e+ D(yt1=ay+l 
Proof. Let a and b be any two elements of the ideal J. Then, 
(e+a)+(ytb)=x+atytb=(e+y)+(at+b) =(a+y)+p 
where p=a-+bis in J. Further, 
(c +a)(y+b) = ay + br +ay+ab 
=aytetdt+e=ay+f 
where c = bx, d= ay, e= ab and f =c+d+e are all elements of J. q 
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Note 13 
Theorem 43 and Note 12 show that the quotient ring R/J defined in Defini- 
tions 72 is independent of the choice of x and y in the cosets x + J and y+ I. 
In other words, the cosets [x + y] and [xy] resulted from addition and respec- 
tively multiplication in no ways depend on the particular representatives x 
and y chosen for the cosets [a] and [y] that go into them. This means that, if 
v1 =x and y, =y, then 
[v1 t+ yi] = [e+ y] 
and 
[ziyi] = [zy] 
or equivalently 
ayt+tyH=xty and xy, =xy 
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Example 31 
Some examples of quotient ring are Z2 = Z/2Z and Ze = Z/6Z. 


Theorem 44 polynomial ring 
The polynomial ring F [x] is a commutative ring with identity. 
Proof. Fz] is a ring over the field F since under addition it is closed, 
associative and commutative, and has 0 as the identity and the inverse —f (zx), 


where f(x) € F [2]; and under multiplication it is associative, distributive and 
commutative, and has 1 as the identity. q 
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Definition 73 principal ideal 


Let R be a commutative ring with identity. Then for any a in R the principal 
ideal generated by a is 


(a) = aR = {ar:r € R} 


Further, R is called principal ideal ring if all its ideals are of this form. 
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Theorem 45 polynomial ring a principal ideal ring 
Let F be a field. Then the polynomial ring F'[:] is a principal ideal ring. 
Proof. The polynomial ring F'[z] being a commutative ring with identity, it 
remains only to show that all its ideals are of the form 

(a) R=aR= {ar:r € R} 
where a is in R. Let I be an ideal of F[a]. If J = 0, then J is a principal ideal 
generated by 0. If [ 40, then choose 0 4 f(x) € I such that 
deg f < degg 

for all non-zero g(x) in I. Write 


g(x) = q(x) f(x) + r(x) 
If degg < deg f, then g = 0 and r = f. On the other hand, if n = deg f < 
deg g, then either r is 0 or degr < deg f. 
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Let 

f() =aoz" +--+: +an and g(x) = box” +---+ dm 
Then, with ag 4 0, 


g(a) = ag *box™—" f(x) + g1(x) (23) 
where deg gi < m—1. Then 
n(x) = q(x)f(x) +r(z) (24) 


From this it follows that either r = 0 or degr < deg f. From Equation’s 23 
and 24, 
g(x) = q(x) f(x) +r(@) where g(x) =ag'boz™-" + 

is in Fla]. If r 4 0, then r(x) is in J and degr < deg f, which contradicts 
our choice of f(x). Therefore g = qf and I is a principal ideal generated by 
f(x). q 
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Definition 74 reducible polynomial 


Let R be a commutative ring with identity. Then a non-constant f(x) in R[z] 
is said to be reducible if, for some g(x) and h(x) in R{s], 


f(x) = g(@)h(a) 


implies either deg g(x) = 0 or degh(x) = 0. Otherwise f(z) is said to be 
reducible. 
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Theorem 46 quotient ring a field 


Let F be a field f(x) in F[z] an irreducible polynomial. Then F[z]/ (f(x)) is 
a field. 


Proof. Let I be the ideal (f(x)) of F[x] generated by f(z). If I = F[z], then 
f(z) has an inverse, that is 1 = f(x)g(x) for some g(x) in Fla]. Then f(z) 
is a constant polynomial, which contradicts our statement of the theorem. 
Therefore F[x]/I has at least two elements, and F[z]/I being a polynomial 
ring it is a commutative ring with identity. 
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Let g € F [x] and g ¢ I. Then, 

J = {a(x) f(x) + (x) g(a) : a(a), d(x) € Flaz]} 
is an ideal of F [x] and there exists h(x) in F[z] such that J = (h(x)). But 

f(x) =1f(«) + 0g(x) 
isin J, and thus f(x) = a(x)h(x) for some a(x) in F[a]. The polynomial f(x) being 
irreducible, either deg h(x) = 0 or deg a(x) = 0. If the latter is the case, then a(x) 
is a unit in F'[x], and then h(x) is in J, hence J = I, and hence a contradiction since 
we began with g being in J but not in J. Therefore it must be the case that h(x) 
is a unit in F[z], hence J is a unit, and thus 
1 = a(x) f(x) + b(x)g(z) 

for some a(x) and 6(x) in F[z]. And then 

141 =1+0(x)g(x) = (b(z) +1) (g(x) +1) 
Thus g(x) + J has an inverse and F|x]/I is a field. q 
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Definition 75 extension of a field 


Let K be a field and F a subfield of K. Then K is called an extension of the 
field F’, denoted by 
K\p 


Since K has multiplication, it is a vector space over F. The dimension of the 
vector space K over F is called the degree 


[K : F] 


of the extension K of F. The extension Kp is said to be finite if the degree 
[K : F) is finite. 
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Definition 76 prime subfield 


A prime subfield of a field F is the intersection of all subfields of F’. It is the 
smallest of all subfields of F’, and is unique. A prime field is a field which has 
no proper subfields. 
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Definition 77 minimal polynomial 
Let K|r be an extension of a field F. Then a € K is said to be algebraic over F 
if there exists f(x) in F[x] which has a as a root. Let a in K be algebraic over F 
and consider 


A= {f(x) € Fiz]: f(a) = 0} 

Here A is an ideal of the principal ideal domain Flax]. Let mi(x) in F[x] be a 
generator of A. If a is the coefficient of the highest power of x in mi(x), then 
m(x) = a~'mi(x) is a monic polynomial with deg m(xz) = deg mi(z), and m(zx) 
is also a generator of A. Let m(x) = r(x)s(x) for some r(x) and s(x) in F[z]. 
Then either r(a) = 0 or s(a) = 0, that is either m(x)|r(x) or m(x)|s(x). But 
degm = degr + deg s, therefore either degr(x) = 0 or deg s(%) = 0. Hence m(x) 
is irreducible. Since m(z) is monic, irreducible and is of the least degree possible 
while admitting a as a root, therefore m(zx) is called the minimal polynomial of a 
over F[x]. 
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Theorem 47 linear code correction capability 
Let C be an (n,k) linear code over F, with prith-check matrix H, and d(C) 
the smallest number of column of H that are linearly dependent. Then if 
every subset of 2¢ or fewer columns of H is linearly independent, the code is 
capable of correcting all error patterns of weight w < t. 


Proof. When q = 2, linear independence amounts to summing to 0. The 
code words of C are those vectors x in V, (£,) for which 


Hx’ =0 


But Hx? is a linear combination of the columns of H, that is to say, if 
H=[ce1 ++: en] then Hx? = xc) +--+ +2nen. Hence a non-zero code 
word of weight w gives a nontrivial linear dependence among w columns of 
Hf, and vice versa. q 
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Corollary 47[1] Hamming as special case 


If g = 2 and all possible linear combinations of up to e columns are distinct, 
then 
d(C) > 2e+1 


and C’ can then correct all patterns of weight e or less. 
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Note 14 from Hamming to BCH code 
Hamming codes correct single errors. An extension of this is to the Bose- 
Chaudhuri-Hocquenghem codes which could correct multiple errors. In the 
case of Hamming code of length n = 2™ — 1, the parity-check matrix is given 
by 
H =[vo ove.# Vn—-1] 
where(vo --: Vn—1) is some ordering of the 2 —1 non-zero column vectors 
in Vin = Vin (Fo). The m x n matrix H takes m parity-check bits for the code 
to be able to correct one error. We may extend H such that it has m more 
rows and could correct two errors. Then, 
Hos Vo ‘t' Vn-1 
Wo '**) Wn-1 
where Wo,.--,Wn_1 are in Vy. 
Coding theory, Finite field- and BCH codes, 2"? December 2005 -25- From 8%" 
November 2005 , as of 14’” January, 2007 


Kit Tyabandha, PhD Department of Mathematics, Mahidol University 
Since v;,’s are distinct, we may look at the mapping from v; to w; as a function 
from V,, into itself, then 
Vo fees Vn-1 

as pe s+ £(¥n-1) 
Then Hp» will define a code which corrects two errors if and only if the syn- 
dromes of the 1+n+ 3) error patterns of weights 0, 1 and 2 are all distinct. 
Any such syndrome is a sum of a subset of columns of Hz, and therefore 
a vector in V2,,. Let the syndrome be s = (51 ... 52m) = (S81 82), 
where sj = (81,.--,Sm) and so = (841,---, 2m) are both in V,,. Defining 
£(0,0) = O we consider a pair of errors occuring at i**- and j** position’s, 
s = (v;+v,,f(v;) + f(v;)). Then the system of equations, u+ v = s; and 
f(u)+f(v) =sg. has at most one solution (u, v) for each pair of vectors from 
Vin- 
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By trial and error we may find neither the linear mapping f(v) = Tv nor the 
nonlinear polynomial of degree 2 works, but f(v) = v? does. The matrix 
(fone Opa 
je E os | 
is the parity-check matrix of a binary code of length n = 2™—1 which corrects 


up to two errors. A vectore =(co -:: Cpy—1) in V, (fo) is a code word in 
the code defined by Hp> if and only if 


nm n 
So cia = Scio? =0 
i=0 i=0 
Since the 2m rows of the matrix H2 over F2 may not be all linearly indepen- 
dent, the dimension of the code is 
d(C) >n-2m=2™-1-2m 
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Definition 78 Vandermonde matriz 
The Vandermonde matrix is defined as 


it stb 1 
ay eee ae 
A= : 
-1 -1 
a’ Qt 


Theorem 48 Vandermonde matriz 


Let ai,...,a, be distinct non-zero elements of a field. Then the the Vandermonde 
matrix is such that 
1 ey 1 
a1 ot Ap 
; #0 
ar! 2 : at! 
1 rT 
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Proof. Subtracting row(i+ 1) — a1 rowi, i= 1,---,r—1, yields, 
1 1 ae 1 
0 a2 — a1 tag Ar — a1 
detA—|9 a@2(a2-a1) +++ ar (ar —a1) 
0 as? (a2—a1) +++ at? (ay —a1) 
doe. 1 
a2 ars ar 
= (a2—a1)+--(ar—ai)] 2 
ai? ee an? 
ac | 
Gig 2s Side 
= (a2 — a1) +++ (ar — a1) - (a3 — @2) +++ (Gr — G2) 
a? dee at 


- Ils, (a; —a;). Then, since a; are distinct and non-zero, therefore det A is 
non-Zero. q 
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Theorem 49 linearly independence 


Any square matrix having a non-zero determinant has all its columns linearly inde- 
pendent. 


Proof. Let A be an r x r matrix, and that |A| 4 0. Then suppose the columns 
of A are linearly dependent. Then one may write some column of A as a linear 
combination of the others, for example 


bid 
C= ) eraers 
i=l 


tAj 
Then if column cj is replaced by cj — )>i=1 aici gives a matrix B with |B| =| Al. 
tA 
But B also has a column whose all elements are zeros, which means that 
| Al =|B] =0 
a contradiction and thus the proof. q 
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Theorem 50 BCH code 
Let (a0,.--,Q@n—1) be an ordering of non-zero elements of Fam, and let t be 
a positive integer such that 
ae ee 
Then the matrix 
ao sth On-1 
na| 8 ta 
2-1 ta 
a a Oni 


is the parity-check matrix of a binary (n,k)-code capable of correcting all 
error patterns of weight w < ¢, with dimension 

k>n-—mt 
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Proof. A vector ¢ = (co,-..,¢n—1) in Va(F2) is a code word if and only if He" = 0. 
Thus, 

n-1 

Social =0 

i=0 


for j = 1,3,...,2£—1. We simplify this by using the fact that (c+ y)? =a’? +y? 
in characteristic 2, and x” = x in Fy. Hence, 


n-1 7 n-1 n-1 
; ay 53 
; aa} = Gay = ; cay! 


i=0 i=0 i=0 
for 7 = 1,3,...,2¢—1, which gives us 

n-1 

S- cia! 

i=0 


for j =1,2,...,2t. 
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Therefore we could also use the parity-check matrix 
ao An-1 
3 
9 a0 An-1 
H0= 
oft ss att, 


According to Theorem 47 H°0 is a parity-check matrix which corrects ¢ errors if and 
only if every subset of 2¢ or fewer columns of H°0 is linearly independent. Next, 
since a subset of r < 2¢ columns of H°0 has the form 


a1.) Ap 
ao @ 
A = 
at... gz 
where @1,...,@, are distinct non-zero elements of F2m, we may consider the matrix 
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a1: Gp 
A*0 = . . 
ayes at 


which is nonsingular since its determinant by the Vandermonde determinant theo- 
rem, Theorem 48, is 


1 Fee) Ah 
a 

det A°0 = a1--- ay 2 =a1---ar |] (a — a) £0 
at a at} i<j 


Then the columns of A°0, and hence those of A, cannot be linearly dependent, and 
therefore the code corrects all error patterns of weight up to t. Now H, as a matrix 
with entries from F) rather than Fo», has dimensions mt x n, hence the dual code 
has dimension k < mt, and the code has dimension k > n — mt. q 
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Theorem 51 minimum distance of linear code 

Let C be a linear (n, k)-code over GF'(q) with parity-check matrix H. Then 

the minimum distance of C’ is d if and only if any d— 1 columns of A are 

linearly independent but some d columns are linearly dependent. 


Proof. The minimum distance of a code d(C) is equal to the minimum of the 
weights of the non-zero code words. Let x = 41 ---%p be a vector in V(n,q). 
Then x is in C if and only if xH? = 0 if and only if ah, + ---+ 2,h, = 0, 
where h,,...,h, are the columns of H. Therefore there is a set of d linearly 
dependent columns of H corresponding to each code word x of weight d. On 
the other hand, if there existed a set of d — 1 linearly dependent columns of 


HT, then there would exist some scalars xj,,...,%i,_,, not all zero, such that 
ae x;, = 0. But if this were the case, then xHT = 0 and so would be a 
code word of weight 0 < d< d(C). q 
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Theorem 52 Singleton bound 


The maximum dictionary size m such that there exists a g-ary (n,m, d)-code 
is 
Aadays ie 


Proof. Let C be aq-ary (n,m, d)-code. If we remove the last d—1 coordinates 
from each code word, then the m vectors of length n —d+1 so obtained must 
be distinct, otherwise d(C) must be less than d, which would contradict the 
statement above. Therefore m < q”~@!. 
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Theorem 53 bound for BCH codes 


Let C be the code over GF(gq), where q is a prime number, and C is defined 
to have the parity-check matrix 


1 1 1 1 

1 2 3 n 
w= |1 2? 3? n? 

1 gd—2 gd-2 7 - ni-2 


whered<n<q-1. If q is a prime-power, then 
A,(n,d) =q™ 4 
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Proof. We have, 


C= {isan € Vlmaat $a, =080rj =0,4,...d-2} 


w=1 


Any d—1 columns form a Vandermonde matrix, and therefore by Theorem’s 
48 and 49 are linearly independent. By Theorem 51 C has a minimum distance 
dand therefore is a q-ary (n,q”~“!, d)-code. The proof follows since C meets 
the Singleton bound of Theorem 52. q 


Coding theory, Finite field- and BCH codes, 2”? December 2005 -38- From 8%” 
November 2005 , as of 14” January, 2007 


God’s Ayudhya’s Defence 14 January, 2007 215 


Coding Theory, notes and projections from lecture Kit Tyabandha, PhD 


Kit Tyabandha, PhD Department of Mathematics, Mahidol University 
Problem 9 decoding BCH 
Find the decoding procedure for the BCH codes. 


Solution. Assume that d = 2¢+1 and A has 2t rows. Suppose the code 
word c = C::'Gp, is transmitted and the vector r = 11-:-Tp is received. 


Assuming that at most ¢ errors have occurred, let 71,..., 2; be their positions 
and m1,...,mz their respective magnitudes. Then the syndrome is 
(s1,. . -, 824) = rH? 
and we have 
n t ; 
sj; = yon = ya (25) 
i=l i=1 


for j = 1,..., 2t. 
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Then from hy A ‘ity 
aya Parra aoe 26 
9(8) 1— 210 fa.” uh pee 26) 
and bis 
=) =m (1+ 20+ 276°---) 
together with Equation 25, we have 
$(0) = 81 + $20 +--+ + 8267! +--- 
Also, from Equation 26 we have 
ay +420 +4307 +---+a,0°-1 
OA erence tees ad ST-y SERS 17 (27) 


1+ 016 + bo? +--+ + 0°68 
Hence, 
(51 + $20 + 8367 +--+) (1+ 010 + bo6? +--+ + 0:6") = a, +d +--+ +a,0°! 
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Which gives us 
a1 
a, = 8, and aps seg hi 1S 2eeat (28) 
j=0 
and 
t 
OS Y Sybiy, CHO The. 2t (29) 
j=0 


With a; and b; known we may turn Equation 27 into partial fractions 


Pl Pt 
) LG 1— q@0 
and therefore m; = p; and 7; = q, fori =1,...,¢, and the system in Equation 
25 is solved. Algorithm 5 then gives the procedure for error correction. 


# 
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Note 15 
The polynomial 


o(0) =14b16 +0207 + ---+0,0' = (1-216) --- (1 — 210) (30) 


can be used to locate the location of the errors. The polynomial 
w(6) =a, t+ae6+---+ a,0°—! 


can be used to find the magnitude of the errors. 
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Algorithm 5 Procedure for correcting up to t errors in BCH codes 


input: r 
find s1,..., $2 
e +maximum number of equations in Equation 29 
fori =e+1tot do 

6; «+ 0 
endfor 
(b1,...,be) <-solve the first e equations of Equation 29 
(z1,...,2e) «find the e zeros of Equation 30 
(a1,...,@e) solve Equation 28 
for i=1toedo 


a1 +a2eit'+dek; 


Le ae | [ja (12521) 


j#i 


1 


endfor 
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Definition 79 linear independence 
Let V be a vector space over F,. Then a set of vectors A = {v1,..., vi} 
in V is said to be linearly independent if and only if a linear combination 
Aivi +-+++Axvx being a zero-vector implies that A;,i = 1,...,k, are zero. 
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Definition 80 linear span of a subspace 


Let V be a vector space over F,. Let S = {vi,...,v,} be a non-empty subset 
of V. Then, the linear span (S) of S is defined as 


k 
(S) = {yam PApe r.| 


We say that the span (S) of S is a subset of V generated or spanned by S. 
Let C be a subspace of V, then a subset S of C is called a generating- or 
spanning set of C if C = (S). 
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Definition 81 inner product of vectors 
An inner product on F, is a mapping (a,b) : Fj x Fj] — F, such that, for 
all u,v,w in F7, 
a. (u+v,w) = (u, w) + (v, w) 
b. (av, w) = a(v, w), where a is a scalar 
c. (v,w) = (w,v) 
d. (u,v) =0, for all non-zero u in F7, if and only if v = 0 
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Definition 82 scalar product 


Let v and w be two vectors in Fj. Then the scalar product, aka the dot- or 
Euclidean inner product, between v and w is defined as v-w = )>;._, vjw; € 
F,. The two vectors are said to be orthogonal to each other if and only if 
v-w=0. The orthogonal complement S+ of a non-empty subset S$ of Fj, is 
defined to be 

S* = {v €F’:v-s=0for alls € S} 


When S$ = @ we define $+ = F?. 
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Note 16 orthogonal complement 


The orthogonal complement $+ of a non-empty subset S of a vector space 
Fj is always a subspace of F7. Moreover, (S)* =St. 
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Definition 83 basis 


Let V be a vector space over F,. Then a non-empty subset A = {v1,-.--,Vn} 
of V is called a basis for V if V = (A) and A is linearly independent. 
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Theorem 54 dimension of a vector space 
Let V be a vector space over F,. If dimv = k, then V has g* elements and 


different bases. 
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Proof. Ifthe basis for V is B= {vi,.-., vz} and A1,..-,A,% are in Fy, then 
V= ae Aiv;. Since |F,| is g, there are gq choices for each ;. Therefore V 
has exactly q* elements. 

Let B = {vi,...,v,%} be a basis for V. Since B is non-empty, v; # 0 and 
there are q* — 1 choices for vi. Then there are q* — q*! choices of v;, for 
i = 2,...,k because v; ¢ (vi,...,vi-1). Therefore there are []#-9 (q* — q') 
distinct ordered k-tuples, (vi,...,v%). The order of vi,...,v,% is irrelevant, 
hence the number of distinct bases for V is 4 []-9 (q* — a’). 
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Corollary 54[1] dimension and size of a code 


Let C' be a linear code of length n over F,. Then, dim C = log, |C|, in other 
words |C| = qti™°. 
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Theorem 55 span and dual 

Let S be a subset of F?. Then, dim ($) + dim S* =n. 

Proof. When (S) = {0}, this is obvious. Next, consider cases where 
dim (S) = k, where 1 < k <n. Let {vi,...,vxe} be a basis of (S), then 
we need to show that dim $+ = dim(S)' =n-—k. Since x is in S+ if and 
only if v) -x =--- = v,-x = 0, or equivalently Ax = 0, where the k x n 
matrix A is 


A=/: 
VE 
we know that the rows of A are linearly independent. Then Ax = 0 is a linear 
system of & linearly independent equations in n variables, where n > k, and 
therefore admits a solution space of dimension n — k. q 
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Corollary 55[1] 


Let C be a linear code of length n over Fy. Then C~ is also a linear code, 
and dimC + dim C+ =n 


Proof. This follows from Note 16 and Theorem 55 above. q 
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Theorem 56 double orthogonal 
Let C be a linear code of length n over Fy. Then, (c+) =: 
Proof. From Corollary 55[1], we have dimC + dimC+ = n and dimC+ + 
dim (c+) =n, and hence dimC = dim (c+). Let c be in C. Then for all 
x in C, we have c-x = 0, hence C C (c+)* and the proof. q 
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Definition 84 linear code 


A linear code of length n over F, is a subspace of F?. The dual code C~ of 
C is the orthogonal complement of the subspace C of F7. The dimension of 
the linear code C’ is the dimentions of C' as a vector space over Fy, that is to 
say, dimC. A linear code C' of length n and dimension k over F7 is called a 
ary [n, k]-code, or an (n,q*)-linear code. If the distance d of C' is known, it 
is called an [n, k, d|-linear code. Furthermore, C is said to be self-orthogonal 
if C C Ct, and self-dual if C = Ct. 
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Definition 85 Hamming weight 


Let x be a word in F7. Then, the Hamming weight w(x) of x is defined as 
the number of non-zero letters in x. In other words, w(x) = d(x, 0), where 0 
is the zero word and d(x, y) is the Hamming distance between two words x 
and y in F?. For each element z of F,, the Hamming weight may be defined 


as 
1, ifs 40 
Ug ie if a 0 


Then for x = (21,.--,%p) in F7, 


w(x) = w(x#1) +--+ +w (an) 
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Theorem 57 Hamming weight and distance 
Let x and y be two words in F7. Then d(x, y) = w(x —y). 
Proof. For each pair of letters « and y in F,, we know that d(x,y) = 0 if 


and only if x = y, that is if and only if x —y = 0, or equivalently w(x—y) = 0. 
The proof follows since w(x) = 07, w(a;) and d(x,y) = 07, d(xi,ys). § 
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Corollary 57[1] q even 
Let ¢ be an even positive integer. Then, for any two words x and y in Fj we 
have d(x,y) = w(x +y). 


Proof. The proof follows from the fact that a = —a for all a in F, when q 
is even. q 
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Theorem 58 inequality 
Let x and y be two words in F§. Then, w(x) + w(y) > w(x +y). 
Proof. For x = (a1,..-,@n) and y = (y1,---,Yn) in F7, let 
x *y = (11Y1,---,2nYn) 
Then, for g=2 andn=1, 


x y @u*Y w(x) + wy) — 2w(a ey w(x+y 
0 0 0 0 0 
0 1 0 1 1 
1 0 0 1 1 
1 1 1 0 0 


From this together with Definition 85 we know that 
w(x ty) = w(x) + w(y) — 2w(x *y) 
for x and y in F9, and thus the proof is implied. q 
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Problem 10 
Prove for any prime power q and x,y in F7, that 


w(x) + w(y) > w(x + y) > w(x) - w(y) 
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Definition 86 elementary row operation 


Let A be a matrix over F,. An elementary row operation performed on A is 
any one among the following. 

a. interchange of two rows 

b. multiplication of a row by a non-zero scalar 

c. replacement of a row by its summation with a scalar multiple of 
another row 
Two matrices are said to be row equivalent to each other if one is obtainable 
from another by a sequence of elementary row operations. 
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Definition 87 equivalent matrices 
Any matrix is row equivalent to a matrix in row echelon (RE) form or reduced 
row echelon (RRE)+{ form formed by a sequence of elementary row operations 
done upon itself. The RRE form of any given matrix is unique, but its RE’s 
may not be so. 
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Appendix 


Course Outline 


Week Date Topic of lecture Hours 
1 28 Oct 2005 Error and distance 3 
2 4 Nov 2005 Entropy and mutual information 3 
3 11 Nov 2005 Group, field and finite field 3 
4 18 Nov 2005 Bounds in coding 3 
5 25 Nov 2005 Group, polynomial & Hamming codes 3 
6 2 Dec 2005 Finite field- and BCH codes 3 
7 9 Dec 2005 Linear codes 3 
8 6 Jan 2006 Cyclic codes 3 
9 13 Jan 2006 Goppa codes 3 
10 20 Jan 2006 MDS code 3 
11 10 Feb 2006 Cryptography 3 
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Quiz 1 
Coding Theory 


20% January 2006 
Time: 1 hours (12:30-1:30pm) 


1. Write the addition [3]} and multiplication [4] tables for Zg. 


Solution. The addition table, 


# 
The multiplication table, 
: 0 1 2 3 4 5 
0 0 0 0 0 0 0 
1 0 1 2 3 4 5 
2 0 2 4 0 2 4 
3 0 3 0 3 0 3 
4 0 4 2 0 4 2 
5 0 5 4 3 2 1 
# 


2. Given ISBN 0198538[]30. Find the missing digit [].[3] 


Solution. For ISBN 2, ...210, 


10 
>s iz; = (mod11) 
i=1 


Writing y for J, 
0+1(2) +9(3) +8(4) +5(5) + 3(6) + 8(7) +y(8)+3(9) = 1874 8y = 0(mod 11) 


Hence y = 0, and the ISBN is therefore 0 19 853803 0. 
# 


+ Numbers between square brackets are marks. 


230 14 January, 2007 God’s Ayudhya’s Defence 


Kit Tyabandha, PhD Coding Theory, notes and projections from lecture 


3. Let f(z) = 1+.2?+2°. Show whether f(z) is irreducible over Z2.[4] Then 
find Ze[z]/ (f(x)).[4] And then draw the addition [5] and multiplication [7] 
tables of Ze[a]/ (f(x)). 


Solution. We note that f(z) is of degree 3. Suppose f(z) be reducible. 

Then it would have a linear factor x or 1+, which would make 0 and 1 roots 

of f(x). But g(0) = g(1) = 1, which is in Z.. Therefore f(x) is irreducible. 
# 


Z2[x]/ (l+2° +a2°)= {0,1,2,1l+2,2°,2+2°,1+27,1l+2+27} 
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The addition table, 


LOOG ‘huonuoe yFT 


aouafag s,vhypniy s,poy 


+ 0 1 x l+e x 2+2? 142? l+e 
0 0 1 x l+ea x 2+2? 142? l+e 
1 1 0 l+e x 142? l4+a+2? 2 e+2? 
£ £ lta 0 1 2+ 2? x l4+a+2? l+e 
l+2 l+a x 1 0 l+242? 142? 2+2? x 
x x 142? gta? 14+2+2? 0 x 1 l+e 
2+2 e+e l+242? 2 l+2 x 0 l+a 1 
142? l+e x l+e+a? ate 1 l+a 0 £ 
ltete? |1lt+et+e? 242? 142 x l+a 1 x 0 
# 
The multiplication table, 
1 x l+e x 2+2? 142? lt+a+2? 
0 0) 0 0 0 0 0 0 
1 0 1 x l+e 2 2+ 2? 142? l+a24+2? 
x x 2 2+2? 142? 1 l+a24+2? l+a 
l+a 0 l+a e+a? 142? 1 1l+242? x x 
2 a 142? 1 l+2+2? x l+z 2+ 2? 
2+2? 2+2? 1 l+242? x l+a x 1+2? 
142? 0 l+2? 1+a+2? x l+z x e+s? 1 
l+a24+2? l+a4+2? l+a x? e+a? 1+2? 1 l+a 


ainjaay Wolf suowoatord puo sajzou ‘fuoays burpop 


Cd ‘oypuoqoiy wy 


— 
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Midterm Examination 


Coding Theory 


27% January 2006 


Time: 2 hours (12:30-2:30pm) 


1. Our task is to design a Hamming code in the case where there are 4 
check digits. First find the binary-representative matrix M to be used for the 
purpose.[1] Find the length of the code word [1] and that of a message word 
[1]. What is the redundancy of this code?[1] Then construct the code.[5] And 
then code the message words 10101010101 [3] and 00111100101 [3]. 


Solution. We have r = 4,n = 24-—1=15 and m = 2+ —4-—1=11. Each 
code word has 15 digits and 4 check digits. 
# 


Each message The redundancy of the code is 4. 


Numbers placed between square brackets are marks. 
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18@ 


LOOG ‘haonuve y FT 


aouafag s,vhypniy s,poy 


check digits 
code word 


message word 


bi 
+ 
bi 


+ 
be 


bg 
| 


ay 


ba 
t 
ba 


bs bg bz 
1 oto4¢ 
ag a3 a4 


bg 


se 
bg 


bg 
| 


a5 


bio 
| 


a6 


bit 
i 


az 


Dio 
i 


ag 


bi3 
| 


ag 


bia 
| 


410 


bis 
| 


aii 


ainjaay Wolf suowoatord puo sajzou ‘fuoays burpop 


Cd ‘oypuoqoiy wy 
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The (2" — 1) x r matrix M is then 


Be EP rEPrPrPrPrROCOCCCSO 
BRE rE rEODOOORrFPHFHHOOSO 
FP ROOrFRFOORFRRFOORFRHO 
FPOrROrROrFRCOrFROrFROFROF 


Then form the matrix equation 
(b) bo --+ bis) M=0 
which gives four equations in four unknown, 


bg + bg + bio + b11 + Dig + b13 + b14 + bis = 0 
bg + bs + bg + b7 + big + big + b14 + bi5 = 0 
bo + b3 + bg + b7 + big + bir + B14 + b15 = 0 
by + b3 + bs + b7 + bg + by + B13 + b15 = 0 


The message word 10101010101 becomes the code word 
b, bo 1 b, O 1 0 bf 101010284 
Hence, bg = 0, b4 = 1, bp = 0 and b; = 1, and the code word is 


101101001010101 


The message 00111100101 becomes the code word 
by bo O bg O 1 1 b 11001021 
which gives us bg = 0, b4 = 0, b2 = 0 and 6; = 1. Therefore the code word is 


100001101100101. 
# 
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2. Show that the polynomial x? + x? + 1 is irreducible [4] and the element 
a= 2+ (2? +2? +1) of Flx]/ (x? + x? +1) primitive [4]. Then use this to 
construct a binary BCH code of length 7 and minimum distance 3.[5] And 
then find the code word for the message 1101.[2] 


Solution. Suppose x? + 2? + 1 is reducible, then it must have either x or 
x +1 asa factor, then x = 0 or 1 is a root of #2 + x? + 1. But 
2 
u+2 


oh? +a? +1 


fe 


+1 
2 
1 
a2 
e+1bpe+a?4+1 
x? + x? 
1 
from which we can see that both z|z? + 2? +1 and z + 1|z? + 27 +1 give 
a remainder 1. Therefore neither x nor x + 1 divides 2° + x? + 1, hence the 
latter is irreducible. 
# 


We know that f(x) in F,[x] of degree n is primitive if f(ax)|a?"~! — 1 and 
f(x) Jc® — 1 for any k <p" —1. We have x? ~! —1 = 27 —1. Then 

gi+e3 +2741 

e+a2+1 gv’ -1 

gi +2% 4+ a4 

r+at+1 

x8 +a° +23 

g+ett+z341 

ge +24 4+ a? 

pt+e?t+1 30 => g42?4I1\27-1 
For k < 7; if k =6; 

etaertte 

e+a7+1 ge —1 

r+ 2° +23 

2 +a3 +1 

x +a*4+ a? 

git+e3? +2741 

et+aetea 

get+a+l £0 => gi +a741 fe&-1 
If k= 5; 
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z+artl1 
et+ar+1 x -—1 
e+att+a? 


z 70 > #42741 fe? -1 
If k= 4; 
ctl 


e+er?+1| «t-1 


et+aetea 


et+aet+l 
e+ar+1 
etd £0 = a +0741 fot-1 
When k = 3, 23 +2741 fx? —1 is obvious. Therefore a = 2+ (2? + 2? +1) 
is a primitive of F[2]/ (x? + x? +1). 


Then, a satisfies a3? +a?+1=0. For F a finite field of order p” with k as its 
prime subfield, we know that a and a? have the same minimum polynomial 
over k for every a € F. Here p = 1; therefore a and a? have the same 
minimum polynomial, hence the generating polynomial is 23 +2?+1. Let the 
message be aga ia2a3. Then the message polynomial is a(x) = ag+aiz+a2x7+ 
a3x*. The corresponding code polynomial is therefore a(x) (x? + 2? +1). In 


other words, 


ag +412 + (ag + az) 2? + (ag + a1 + a3) 2? + (ay + a2) 244+ (a2 +43) & 
The code word is thus 


(ao, 41, (a0 + a2) , (ao + a1 + a3), (a1 + G2) , (a2 + G3) , a3) 


For the message 1101, the code word is 1111111. 
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Quiz 2 
Coding Theory 


3°¢ February 2006 
Time: 1 hours (12:30-1:30pm) 


1. Let S = {11010, 10111, 01010, 01101}. Find a basis for C (S$). [5] What is 
the dimension &? [1] Find the binary code C. [4] Find also all cosets of C. 


[10] 


Solution. 


oO F 


Form and reduce our matrix A, 


SereOor 


HOrO 


COrrFrH 


0 1 10 1 0 1 10 1 0 
1 “% 0 110i i 01i10di1 
0 0 10 1 +0 001i1di1 
1 0 1i104di1 00 0 0 0 


Hence the basis is {11010, 01101, 00111} 


# 
The basis has three components, therefore dimension k is 3. 
# 
The binary code is 
C = {00000, 11010, 01101, 00111, 10111, 11101, 01010, 10000} 
# 


cofactor 
I 00000 + C 
II 00001+C 
III 00010+C 
IV. 00100+C 


—> 
—> 
—> 
—> 


—> 


words 

00000, 11010, 01101, 00111, 10111, 11101, 01010, 10000 
00001, 11010, 01100, 00110, 10110, 11100, 01011, 10001 
00010, 11000, 01111, 00101, 10101, 11111, 01000, 10010 
00100, 11110, 01001, 00011, 10011, 11001, 01110, 10100 


Since there other coset leaders all give one of these four cosets, therefore the 
number of cosets is four and all of them are listed above. 


# 


2. Based on te factorisation z° — 1 = (1+2)?(l+2+ x)’, find a binary 
[6, 3] cyclic code. [10] 


Solution. 


238 


Here we are given k = 3 and n = 6. List all nine monic divisors 
of x6 —1 and note the degree k, that is the number of different bases, of each. 
This is #*(-),i < k. 
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1 > k=6 
l+e@ > k=6 
l+a2+2? > k=5 
(1+2)? = k=4 
(l+a)(1+a+27) 74 k=3 
(l+a)?(l+a2+27) > k=2 
(1+2+.22)” = «K=S2 
(l+a)(l+a+27) 74 k=1 
(1+ 2°) > k=0 


For k = 3; 
(l+2)(l+2+27) =1+2° 


0-(1+2%) — 000000 
1-(1427) + 100100 
a-(1+23) + 010010 


a?-(1+23) - 001001 
And the pairwise additions among these give us the remaining code words. 
Then, 


C = {000000, 100100, 010010, 001001, 110110, 101101, 011011, 111111} 
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Quiz 3 
Coding Theory 
10% February 2006 
Time: 1 hour (12:30-1:30pm) 
1. Consider the matrix 
1 3 4 
A=]2 3 1 
24 4 
over GF(5). Show that A can give minimum distance separable (MDS) codes. 
[6] Find two such codes if they exist, and give either a generator matrix or 
a parity check matrix for each of them. [6] Then give the code words and 


encoding functions for each. [8] 


Solution. Examine the values of determinant for all submatrices of A, 


1 3 14 3 4 1 3 1 4 3.4 

E 3}— |e i Pals 1) he Al 939 al tela t|=1. 
1 3 4 

Seles 2, oie 1, 2 Tl ae 3,/2 3 1)=3. Wecan see that no sub- 

2 4 2 4 4 4 Bara 


matrices of A are singular, therefore we may obtain from A two MDS codes, 
namely the [6, 3, —] code over GF(5) with the generator matrix G = (Iz A) 
and the [6,3, —] code over GF'(5) with the parity check matrix H —(A_ Is). 
For the first one, the generating function is 


100413 4 
G={,0 1023 1 
0012 4 4 
Then, 
1004313 4 
(a, a2 G3 a4 Gs a6)=(a@ Go azg){ 0 10 2 3 1 
00412 4 4 


and the encoding functions becomes 


a4 = a1 + 2ag + 2a3 
as = 3a, + 3a + 4a3 
ag = 4a, + a2 + 4az 


# 
# 
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For the [6,3,—] code, from the parity check matrix we have the generating 
function 


100 4 3 8 
G=(Iz3 -—AT)={0102 21 
0041414421 
Then, 
100 4 3 8 
(a1 a2 G3 a4 G5 a6)=(a Gao as){O0 102 2 1 
00114421 


and the encoding functions become 


a4 = 4a, + 2a2 + a3 
as = 3a, + 2a2 + 4a3 


ag = 3a, + a2 +43 


And then the code is C’ = {100433, 010221, 001141, 110104, 101024, 011312}. 
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Final Examination 
Coding Theory 


237 February 2006 
Time: 3 hours (12:30-3:30pm) 


1. A channel matrix P is a matrix whose elements p;; has the value of 
the probability that an input a; in an input alphabet 4, will produce an 
output 6; in an output alphabet 2. The r*h extension of this channel is the 
discrete memoryless channel with input alphabet mo”) = aja2...a,, output 
alphabet 5” = b,b)...b,, and channel matrix P‘) whose components are 
ps = p(bi|a1) p (bg|az) ---p(b,|a,). Furthermore, P is said to be a stochastic 
matriz if pi; > 0 for all i,j and all row sums are equal to 1, that is to say, 


So ij =1 
j 


a. Consider a binary symmetric channel, that is to say, a mapping from 
input to output such that ©; = Ye = {0,1}. The second extension 
of this channel has input and output alphabet {00,01,10,11}. If the 
channel matrix for this binary channel is 


what would be the channel matrix for the second extension of this chan- 
nel? [4] Find P), if it is stochastic. [2] 

b. Next, consider the binary erasure channel whose input alphabet is 4; = 
{0,1} and output alphabet H2 = {0,1,*}. The channel matrix in this 
case may be described as 


l-e 0 E 
P= 0 l-—ée ¢ 


Find the channel matrix for the second extension of this channel. [9] 
2. 

a. Explain uncertainty and entropy, conditional entropy, and mutual infor- 
mation, giving examples where appropriate. [2] 

b. Let H(X) be the entropy of a random variable X that takes on a finite set 
of values with probabilities p,,p2,..-, Pn, that is H(X) = — >>, peln pp. 
Show how the following statements are true. [8] 

i. H(p,,..-,pn) is at maximum when p; = po =... = Dn 
ii, H(pi,-.-,Pn) = H (Pr(1),--+>Px(n)) 
iii. H (pi,.-.,Pn) > 0, where the equality holds only when one of the 
p;'sis 1. 


n 
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iv. H (pt, ---;Pn;9) — H (py,---,Pn) 


v. 
1 1 1 1 1 1 
H|—-,-,...,—) <H | ——~, —,..., 
non nj — n+lon+1 n+1 
vi. H (pi,.-.,pn) is a continuous function of its arguments. 
vii. 
H(t) a(S) +H (5.2) 
mn mn mn m m n n 


viii. Let p=p, +---+pm and q=q@+---+@n, where p+q=1, pand 
q are positive and p; and p; non-negative. Then, 


FA (p.,---,Pm;5---;4n) = 


PL Pm al Qn 
A 0d + pH 1 +H (2,...,%) +aH (4...) 
(p,q) (p,q) 3 3 - 


3. 

a. Give the meaning and explanation of the terms cryptography, cryptanal- 
ysis, cryptology, key, plain text, code, cipher, encode, decode, encrypt, 
decrypt and nondeterministic polynomial—complete. [6] 

b. Caesar cipher shifts all message letters three positions to the right on the 
ordered list of characters of the roman alphabet. Encode the previous 
sentence using Caesar cipher. [3] 

c. One of the most basic encryption algorithms is the transposition of order 

d, in which the message is divided into blocks of length d and then a permu- 

tation 7 of 1,2,...,d is applied to each block. As an example, when d = 5 

and a = {34521}, Shakespeare’s ‘Fair is foul, and foul is fair’ is encrypted 

into ‘rfai fis a ul,siul ri fa’. Again, let d= 5 and 7 = {53124}, then try to 
decrypt, ‘thi w cyrms-sbo iw ooth se hatatbrls s o’. Find what the original 

message is. [6] 

4, Write an essay of approximately 25 lines on either one of the following 

topics: group, polynomial, Hamming, linear, or cyclic codes, MDS, Goppa, 

Hadamard, or quadratic residue code, automorphism group of a code, entropy 

and mutual information, or cryptography. [10] 
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Students’ scores 
Coding Theory 
2005-6 


14% January, 2007 


Kit Tyabandha, PhD 


Quiz 1 
Quiz 1 was done on 20 January 2006. There were three questions 

Name ID Question 

Mes 422 
Kantadita 4505016 7 3 
Jaranya 4505039 7 0.2 
Nathabol 4505054 7 3 
Dhaneés 4505071 7 1 
Rattiya 4505181 7 3 
Rattayaporn 4505183 7 2 
Vauranan 4505188 7 2 
Saniwan 4505216 7 1 
Vasana 4505204 7 2 
Siridibya 4505220 7 O 


Table 6 Students’ midterm scores. 


The scores and ranks for Quiz 1 are shown in Table 7. 


ID score rank ID score rank 
45016 15 2 45071 114 8 
45039 19.3 1 45181 14.7 3 
45054 14.7 3 45183 13.9 4 


Total 
3 
5 15 
12.1 19.3 
4.7 14.7 
3.4 11.4 
4.7 14.7 
4.9 13.9 
4.8 13.8 
4.2 12.2 
4.9 13.9 
4.9 11.9 
ID score 
45188 13.8 
45204 13.9 
45216 12.2 
45220 11.9 


Table 7 Mark and rank of students’ scores from Quiz 1. 


rank 
5 


4 
6 
7 


The total final score for Quiz 1 is 10. The mean is 7.04, median 6.95, minimum 
5.70 and Maximum 9.65. The standard deviation is 1.11. Figure 8 shows the 


distribution. 
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Score distribution, Coding Theory, Quiz 1, 2005-— 
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5 
score 


Figure 8 Distribution of students’ Quiz 1 scores. 


Midterm Exam 


Midterm Exam was done on 27 January 2006. There were two questions of 
15 points each. The total collected mark was 20. Table 8 gives scores for each 


question. 


ID 

45016 
45039 
45054 
45071 
45092 
45181 
45183 
45188 
45204 
45216 
45220 


M1(15) 
14.5 


M2(15) 
15 
3.8 


3.8 
11.4 


Table 8 Mark and rank of students’ scores from Midterm Exam. 


The total final score for the midterm exam is 20. Of this, the mean is 17.05, 
median 17.47, minimum 12.33 and maximum 20. The standard deviation is 


3.17. The scaled scores are given in Table 9. 
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ID 

45016 
45039 
45054 
45071 
45092 
45181 
45183 
45188 
45204 
45216 
45220 


Table 9 Mark and rank of students’ scores from Midterm Exam. 


Figure 9 shows the distribution of midterm scores. 


score 
19.67 
12.53 
17.47 
12.33 
16.67 
19.80 
20 

19.40 
20 

12.53 
17.13 


CorRP KrF WN © ClO W = 
S 
> 


a 
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Exam, 2005-6 


3.5 T 


2.57 


SP 


number of students 


0.5 


10 
score 


Figure 9 Distribution of students’ Midterm scores. 


Quiz 2 


20 


Quiz 2 was done on 3 February 2006. There were two questions. The total 
mark is 30, which is later scaled down to 10. Table 10 gives the scores for 


each question. 
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ID score rank 
45016 11.5 10 
45039 0.5 2.5 
45054 19 9.8 


45071 6 1.3 
45092 8 6.3 
45181 8 5.7 
45183 8 1 
45188 4.8 7 
45204 1 7.1 
45216 8 5) 


45220 7.3 1.51 
Table 10 Mark and rank of students’ scores from Quiz 2. 


Figure 10 shows the distribution of Quiz 2 scores. 


Score distribution, Coding Theory, Quiz 
T T T 


2 2005-6 
2.5 T T T T 


cole 7 


number of students 


0 L i Ll Ll L Ll Ll L 


5 
score 


Figure 10 Distribution of students’ Quiz 2 scores. 


For Quiz 2 the scaled total score is 10. Then the mean after scaling is 4.22, 
median 3.93, minimum 1, maximum 9.60, and the standard deviation 2.39. 
Table 11 gives the scaled score together with ranking. 
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ID score rank 
45016 7.17 2 
45039 1 11 


45054 960 1 
45071 243 10 
45092 4.77 3 


45181 4.57 4 
45183 3 7 
45188 3.93 6 
45204 2.70 9 
45216 433 5 
45220 2.93 8 


Table 11 Mark and rank of students’ scores from . 


Quiz 3 


Quiz 3 took place on 10 February 2006. There was only question the marks 
of which is 20. This is later scaled down to 10. 


ID score rank ID score rank ID score rank 
45016 16.7 1 45071 16.7 1 45183 16.7 1 
45039 15.2 3 45092 16.7 1 45188 16.7 1 
45054 «16.7 1 45181 16.7 1 45204 16.6 2 


45216 16.7 1 
45220 11.8 4 


Table 12 Scores and ranks of Quiz 3. 


From Quiz 3 the scaled mean is 8.05, and the median 8.35, minimum 5.90, 
maximum 8.35 and standard deviation 0.75. 
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Score distribution, Coding Theory, Quiz 3 2005-6 
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Figure 11 Distribution of students’ Quiz 8 scores. 


Final Exam 


Final Exam was held on 24 February 2006. There were four questions, which 
add up to 50 marks in total. This is later scaled down to 30. 


score rank ID score rank ID score rank 

9.7 8 45071 109 6 45183 17.3 4 

8.6 9 45092 19.4 2 45188 186 3 

31 1 45181 9.9 7 45204 146 5 
45216 7.9 
45220 7.3 


Table 13 Scores and ranks of Final Exam. 


The Final Exam scores are scaled from the original total of 50 into 30. Con- 
sequently they have as the mean 8.47, median 6.54, minimum 4.38, maximum 
18.60 and standard deviation 4.27. Figure 12 gives the plot of the distribution 
of the scores. 
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12 Distribution of students’ final exam scores. 


Practice and attendance 


30 


PhD 


Then there are marks from practice and attendance. These are listed in Table 


14. 


Table 14 Practice and attendance. 


ID 
45016 
45039 
45054 
45071 
45092 
45181 
45183 
45188 
45204 
45216 
45220 


Practice Attendance 
10 8.5 
9 8 
10 10 
10 10 
10 10 
10 9 
10 9.5 
10 10 
10 9.5 
10 10 
10 8.5 


Total Scores 


The total scores are given in Table 15. 
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ID Quiz1 Quiz2 Quiz3 Midterm Final Practice Attendance 


(10) (10) = (10) ~— (20) (30) (10) 

45016 7.5 7.17 8.35 19.67 5.82 8.5 10 
45039 9.65 1 7.6 12.53 5.16 8 9 

45054 7.35 9.6 8.35 17.47 18.6 10 10 
45071 5.7 2.43 8.35 12.33 6.54 10 10 
45092. — 4.77 8.35 16.67 11.64 10 10 
45181 7.35 4.57 8.35 19.8 5.94 9 10 
45183 6.95 3 8.35 20 10.38 9.5 10 
45188 6.9 3.93 8.35 19.40 11.16 10 10 
45204 6.95 2.7 8.3 20 8.76 9.5 10 
45216 6.1 4.33 8.35 12.53 4.74 10 10 
45220 5.95 2.93 5.9 17.13 4.38 8.5 10 


Table 15 Total score, Coding Theory, second term, 2005-6. 


The total score has as the mean 63.46, median 65.01, minimum 52.94, max- 
imum 81.37 and standard deviation 8.46. The rank is shown together with 
the score for each student in Table 16. 


ID score rank ID score rank ID score rank 
45016 67 4 45071 55.36 9 45183 68.18 3 
45039 52.94 11 45092 61.42 7 45188 69.74 2 
45054 81.37 1 45181 65.01 6 45204 66.21 5 


45216 56.06 8 
45220 54.8 10 


Table 16 Total score and rank, Coding Theory, 2005-6. 
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Score distribution, Coding Theory, Total Score, 2005-6 
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Figure 13 Distribution of students’ total score, Coding Theory. 


For grades, we look at two candidate grading scheme as shown in Table 17. 
Scheme A more closely resembles a hard grading scheme, that is to say, one 
which is independent of performance of student. Scheme B is attached to the 
range used in drawing the histogram of Figure 13. In that figure there appear 
three clusters of grades. These we make correspond to the three grades given, 
that is A, Bt and B. 


Grade Mark range 


A (75, 100] 
Bt (60, 75] 
B (50, 60] 


Table 17 Grading schemes. 


According to these scheme, the grades are thus as shown in Table 18. 
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ID 
45016 
45039 
45054 
45071 
45092 
45181 
45183 
45188 
45204 
45216 
45220 


Score Grade 


67 Bt 
52.94 B 
81.37 A 
05.36 B 
61.42 Br 
65.01 Br 
68.18 Br 
69.74 Br 
66.21 Br 
06.06 B 
54.8 B 


Table 18 Grades according to our grading scheme of Table 17. 
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Kinder is a german word that means ‘children’. So the name of the place 
should more correctly be written ‘Kinderscout’. But then again there are 
Kinder Downfall and Kinder Low. To do the same thing everywhere would 
probably result in something seemingly out of place sitting in an English 
context. Therefore the name is normally written ‘Kinder Scout’. 


Kit Tyabandha 
Bangkok, April 2006 
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